How to profile a vector outer product in matlab - matlab

during my matlab profiling, i noticed one line of code that consumes much more time than i imagined. Any idea how to make it faster?
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
X, and Y are symmetric matrices with the same size (dxd), k is an index of a single row/column in Y, ids_A is a vector of indices of all the other rows/columns( therefore Y(ids_A,k) is a column vector and Y(k,ids_A) is a row vector)
ids_A = setxor(1:d,k);
Thanks!

You can perhaps replace the outer product multiplication with a call to bsxfun:
X = Y(ids_A, ids_A) - (bsxfun(#times, Y(ids_A,k), Y(k,ids_A))/Y(k,k));
So how does the above code work? Let's take a look at the definition of the outer product when one vector is 4 elements and the other 3 elements:
Source: Wikipedia
As you can see, the outer product is created by element-wise products where the first vector u is replicated horizontally while the second vector v is replicated vertically. You then find the element-wise products of each element to produce your result. This is eloquently done with bsxfun:
bsxfun(#times, u, v.');
u would be a column vector and v.' would be a row vector. bsxfun naturally replicates the data to follow the above pattern, and then we use #times to perform the element-wise products.

I am assuming your code to look something like this -
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
With the given code snippet, it's safe to assume that you are somehow using X within that loop. You can calculate all the X matrices as a pre-calculation step before the start of such a loop and these calculations could be performed as a vectorized approach.
Regarding the code snippet itself, it can be seen that you are "escaping" one index at each iteration with setxor. Now, if you are going with a vectorized approach, you can perform all those mathematical operations in one-go and later on remove the elements that got incorporated in the vectorized approach, but weren't intended. This really is the essence of a bsxfun based vectorized approach listed next -
%// Perform all matrix-multiplications in one go with bsxfun and permute
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
%// Scale those with diagonal elements from Y and get X for every iteration
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
%// Find row and column indices as linear indices to be removed from X_all
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
%// Remove those "setxored" indices and then reshape to expected size
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
Benchmarking
Benchmarking Code
d = 50; %// Datasize
Y = rand(d,d); %// Create random input
num_iter = 100; %// Number of iterations to be run for each approach
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('------------------------------ With original loopy approach')
tic
for iter = 1:num_iter
for k = 1:d
ids_A = setxor(1:d,k);
X = Y(ids_A, ids_A) - (Y(ids_A,k) * Y(k,ids_A))/Y(k,k);
end
end
toc
clear X k ids_A
disp('------------------------------ With proposed vectorized approach')
tic
for iter = 1:num_iter
mults = bsxfun(#times,permute(Y,[1 3 2]),permute(Y,[3 2 1]));
scaledvals = bsxfun(#rdivide,mults,permute(Y(1:d+1:end),[1 3 2]));
X_vectorized = bsxfun(#minus,Y,scaledvals);
row_idx = bsxfun(#plus,[0:d-1]*d+1,[0:d-1]'*(d*d+1));
col_idx = bsxfun(#plus,[1:d]',[0:d-1]*(d*(d+1)));
X_vectorized([row_idx col_idx])=[];
X_vectorized = reshape(X_vectorized,d-1,d-1,d);
end
toc
Results
Case #1: d = 50
------------------------------ With original loopy approach
Elapsed time is 0.849518 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 0.154395 seconds.
Case #2: d = 100
------------------------------ With original loopy approach
Elapsed time is 2.079886 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 2.285884 seconds.
Case #1: d = 200
------------------------------ With original loopy approach
Elapsed time is 7.592865 seconds.
------------------------------ With proposed vectorized approach
Elapsed time is 19.012421 seconds.
Conclusions
One can easily notice that the proposed vectorized approach might be a better choice when dealing with matrixes of sizes upto 100 x 100 beyond which
the memory-hungry bsxfun slows us down.

Related

Calculate a function over all permutations of columns

I have this code:
abs(mean(exp(1i*( a(:,1) - a(:,2) ))))
where a is a 550-by-129 double matrix. How can I write code using that code to replace a(:,1) with a(:,2) and then a(:,3) and so on because I need each column to subtract from every other column?
Another method using matrix multiplication:
E = exp(1i*a);
result = abs(E.'*(1./E)/size(E,1));
Explanation:
You can rewrite the expression
exp(1i*( a - b) ))
as
exp(1i*a)/exp(1i*b)
so
exp(1i*a)*(1/exp(1i*b))
and mean(x) is sum(x)/n
Using that you can do your task using very fast matrix multiplication.
Result of a comparison between different methods in Octave:
Matrix Multiplication:
Elapsed time is 0.0133181 seconds.
BSXFUN:
Elapsed time is 1.33882 seconds.
REPMAT:
Elapsed time is 1.43535 seconds.
FOR LOOP:
Elapsed time is 3.10798 seconds.
Here is the code for comparing different methods.
Looped, this is an easy trick; let an outer loop run over all indices, and an inner loop as well.
a = rand(550,129);
out = zeros(size(a,2),size(a,2));
for ii = 1:size(a,2)
for jj = 1:size(a,2)
out(ii,jj) = abs(mean(exp(1i*(a(:,ii)-a(:,jj)))));
end
end
No loops, one line:
result = permute(abs(mean(exp(1i*bsxfun(#minus, a, permute(a, [1 3 2]))),1)), [2 3 1]);
This computes all pairs of row differences as a 3D array, where the second and third dimensions refer to the two row indices in the original 2D arrays; then applies the required operations along the first dimension; and finally permutes the dimensions to yield a 2D array result.
a bit off-topic, but you can do that with indexing too
a = rand(550,129);
c = repmat(1:size(a,2),1,size(a,2));
c(2,:) = imresize(1:size(a,2), [1 length(c)], 'nearest');
out = abs(mean(exp(1i*( a(:,c(1,:)) - a(:,c(2,:)) ))));
out = reshape(out,[size(a,2) size(a,2)]); % 129x129 format

MATLAB: Block matrix multiplying without loops

I have a block matrix [A B C...] and a matrix D (all 2-dimensional). D has dimensions y-by-y, and A, B, C, etc are each z-by-y. Basically, what I want to compute is the matrix [D*(A'); D*(B'); D*(C');...], where X' refers to the transpose of X. However, I want to accomplish this without loops for speed considerations.
I have been playing with the reshape command for several hours now, and I know how to use it in other cases, but this use case is different from the other ones and I cannot figure it out. I also would like to avoid using multi-dimensional matrices if at all possible.
Honestly, a loop is probably the best way to do it. In my image-processing work I found a well-written loop that takes advantage of Matlab's JIT compiler is often faster than all the extra overhead of manipulating the data to be able to use a vectorised operation. A loop like this:
[m n] = size(A);
T = zeros(m, n);
AT = A';
for ii=1:m:n
T(:, ii:ii+m-1) = D * AT(ii:ii+m-1, :);
end
contains only built-in operators and the bare minimum of copying, and given the JIT is going to be hard to beat. Even if you want to factor in interpreter overhead it's still only a single statement with no functions to consider.
The "loop-free" version with extra faffing around and memory copying, is to split the matrix and iterate over the blocks with a hidden loop:
blksize = size(D, 1);
blkcnt = size(A, 2) / blksize;
blocks = mat2cell(A, blksize, repmat(blksize,1,blkcnt));
blocks = cellfun(#(x) D*x', blocks, 'UniformOutput', false);
T = cell2mat(blocks);
Of course, if you have access to the Image Processing Toolbox, you can also cheat horribly:
T = blockproc(A, size(D), #(x) D*x.data');
Prospective approach & Solution Code
Given:
M is the block matrix [A B C...], where each A, B, C etc. are of size z x y. Let the number of such matrices be num_mat for easy reference later on.
If those matrices are concatenated along the columns, then M would be of size z x num_mat*y.
D is the matrix to be multiplied with each of those matrices A, B, C etc. and is of size y x y.
Now, as stated in the problem, the output you are after is [D*(A'); D*(B'); D*(C');...], i.e. the multiplication results being concatenated along the rows.
If you are okay with those multiplication results to be concatenated along the columns instead i.e. [D*(A') D*(B') D*(C') ...],
you can achieve the same with some reshaping and then performing the
matrix multiplications for the entire M with D and thus have a vectorized no-loop approach. Thus, to get such a matrix multiplication result, you can do -
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
But, if you HAVE to get an output with the multiplication results being concatenated along the rows, you need to do some more reshaping like so -
out = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
Benchmarking
This section covers benchmarking codes comparing the proposed vectorized approach against a naive JIT powered loopy approach to get the desired output. As discussed earlier, depending on how the output array must hold the multiplication results, you can have two cases.
Case I: Multiplication results concatenated along the columns
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(z,y*num_mat);
for k1 = 1:y:y*num_mat
out1(:,k1:k1+y-1) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
toc
Case II: Multiplication results concatenated along the rows
%// Define size paramters and then define random inputs with those
z = 500; y = 500; num_mat = 500;
M = rand(z,num_mat*y);
D = rand(y,y);
%// Warm up tic/toc.
for k = 1:100000
tic(); elapsed = toc();
end
disp('---------------------------- With loopy approach')
tic
out1 = zeros(y*num_mat,z);
for k1 = 1:y:y*num_mat
out1(k1:k1+y-1,:) = D*M(:,k1:k1+y-1).'; %//'
end
toc, clear out1 k1
disp('---------------------------- With proposed approach')
tic
mults = D*reshape(permute(reshape(M,z,y,[]),[2 1 3]),y,[]);
out2 = reshape(permute(reshape(mults,y,z,[]),[1 3 2]),[],z);
toc
Runtimes
Case I:
---------------------------- With loopy approach
Elapsed time is 3.889852 seconds.
---------------------------- With proposed approach
Elapsed time is 3.051376 seconds.
Case II:
---------------------------- With loopy approach
Elapsed time is 3.798058 seconds.
---------------------------- With proposed approach
Elapsed time is 3.292559 seconds.
Conclusions
The runtimes suggest about a good 25% speedup with the proposed vectorized approach! So, hopefully this works out for you!
If you want to get A, B, and C from a bigger matrix you can do this, assuming the bigger matrix is called X:
A = X(:,1:y)
B = X(:,y+1:2*y)
C = X(:,2*y+1:3*y)
If there are N such matrices, the best way is to use reshape like:
F = reshape(X, x,y,N)
Then use a loop to generate a new matrix I call it F1 as:
F1=[];
for n=1:N
F1 = [F1 F(:,:,n)'];
end
Then compute F2 as:
F2 = D*F1;
and finally get your result as:
R = reshape(F2,N*y,x)
Note: this for loop does not slow you down as it is just to reformat the matrix and the multiplication is done in matrix form.

Mablab/Octave - use cellfun to index one matrix with another

I have a cell containing a random number of matrices, say a = {[300*20],....,[300*20]};. I have another cell of the same format, call it b, that contains the logicals of the position of the nan terms in a.
I want to use cellfun to loop through the cell and basically let the nan terms equal to 0 i.e. a(b)=0.
Thanks,
j
You could define a function that replaces any NaN with zero.
function a = nan2zero(a)
a(isnan(a)) = 0;
Then you can use cellfun to apply this function to your cell array.
a0 = cellfun(#nan2zero, a, 'UniformOutput', 0)
That way, you don't even need any matrices b.
First, you should probably give the tick to #s.bandara, as that was the first correct answer and it used cellfun (as you requested). Do NOT give it to this answer. The purpose of this answer is to provide some additional analysis.
I thought I'd look into the efficiency of some of the possible approaches to this problem.
The first approach is the one advocated by #s.bandara.
The second approach is similar to the one advocated by #s.bandara, but it uses b to convert nan to 0, rather than using isnan. In theory, this method may be faster, since nothing is assigned to b inside the function, so it should be treated "By Ref".
The third approach uses a loop to get around using cellfun, since cellfun is often slower than an explicit loop
The results of a quick speed test are:
Elapsed time is 3.882972 seconds. %# First approach (a, isnan, and cellfun, eg #s.bandara)
Elapsed time is 3.391190 seconds. %# Second approach (a, b, and cellfun)
Elapsed time is 3.041992 seconds. %# Third approach (loop-based solution)
In other words, there are (small) savings to be made by passing b in rather than using isnan. And there are further (small) savings to be made by using a loop rather than cellfun. But I wouldn't lose sleep over it. Remember, the results of any simulation are specific to the specified inputs.
Note, these results were consistent across several runs, I used tic and toc to do this, albeit with many loops over each method. If I wanted to be really thorough, I should use timeit from FEX. If anyone is interested, the code for the three methods follows:
%# Build some example matrices
T = 1000; N = 100; Q = 50; M = 100;
a = cell(1, Q); b = cell(1, Q);
for q = 1:Q
a{q} = randn(T, N);
b{q} = logical(randi(2, T, N) - 1);
a{q}(b{q}) = nan;
end
%# Solution using a, isnan, and cellfun (#s.bandara solution)
tic
for m = 1:M
Soln2 = cellfun(#f1, a, 'UniformOutput', 0);
end
toc
%# Solution using a, b, and cellfun
tic
for m = 1:M
Soln1 = cellfun(#f2, a, b, 'UniformOutput', 0);
end
toc
%# Solution using a loop to avoid cellfun
tic
for m = 1:M
Soln3 = cell(1, Q);
for q = 1:Q
Soln3{q} = a{q};
Soln3{q}(b{q}) = 0;
end
end
toc
%# Solution proposed by #EitanT
[K, N] = size(a{1});
tic
for m = 1:M
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
Soln4 = mat2cell(a0, K, N * ones(size(a)));
end
toc
where:
function x1 = f1(x1)
x1(isnan(x1)) = 0;
and:
function x1 = f2(x1, x2)
x1(x2) = 0;
UPDATE: A fourth approach has been suggested by #EitanT. This approach concatenates the cell array of matrices into one large matrix, performs the operation on the large matrix, then optionally converts it back to a cell array. I have added the code for this procedure to my testing routine above. For the inputs specified in my testing code, ie T = 1000, N = 100, Q = 50, and M = 100, the timed run is as follows:
Elapsed time is 3.916690 seconds. %# #s.bandara
Elapsed time is 3.362319 seconds. %# a, b, and cellfun
Elapsed time is 2.906029 seconds. %# loop-based solution
Elapsed time is 4.986837 seconds. %# #EitanT
I was somewhat surprised by this as I thought the approach of #EitanT would yield the best results. On paper, it seems extremely sensible. Note, we can of course mess around with the input parameters to find specific settings that advantage different solutions. For example, if the matrices are small, but the number of them is large, then the approach of #EitanT does well, eg T = 10, N = 5, Q = 500, and M = 100 yields:
Elapsed time is 0.362377 seconds. %# #s.bandara
Elapsed time is 0.299595 seconds. %# a, b, and cellfun
Elapsed time is 0.352112 seconds. %# loop-based solution
Elapsed time is 0.030150 seconds. %# #EitanT
Here the approach of #EitanT dominates.
For the scale of the problem indicated by the OP, I found that the loop based solution usually had the best performance. However, for some Q, eg Q = 5, the solution of #EitanT managed to edge ahead.
Hmm.
Given the nature of the contents of your cell array, there may exist an even faster solution: you can convert your cell data to a single matrix and use vector indexing to replace all NaN values in it at once, without the need of cellfun or loops:
a0 = [a{:}]; %// Concatenate matrices along the 2nd dimension
a0(isnan(a0)) = 0; %// Replace NaNs with zeroes
If you want to convert it back to a cell array, that's fine:
[M, N] = size(a{1});
mat2cell(a0, M, N * ones(size(a)))
P.S.
Work with a 3-D matrix instead of a cell array, if possible. Vectorized operations are usually much faster in MATLAB.

Efficient way in MATLAB to apply the same left and right matrix multiplication to a large set of matrices

I have a lot of 2-by-2 matrices S1, S2, ..., SN, and on each of those matrices, I want to perform a left and right matrix multiplication as in R*S*R^T, where R is also a 2-by-2 matrix. Obviously I could just write this with a for loop, but I anticipate it being very slow for large N in MATLAB. Is there a simple and efficient way to accomplish this without using a for loop? Thanks in Advance!
Your biggest problem is not the loops. For matrices so small calling MATLABs A*B introduces a lot of overhead. The best thing you can do is to store all the matrices in a large 4 x n_matrices matrix and spell out the matrix multiplications manually:
A = rand(4, 1000);
B = rand(4, 1000);
tic;
C = zeros(size(A));
C(1,:) = A(1,:).*B(1,:) + A(3,:).*B(2,:);
C(2,:) = A(2,:).*B(1,:) + A(4,:).*B(2,:);
C(3,:) = A(1,:).*B(3,:) + A(3,:).*B(4,:);
C(4,:) = A(2,:).*B(3,:) + A(4,:).*B(4,:);
toc
Elapsed time is 0.020950 seconds.
As you see, this takes little time (this is a 6-years old desktop PC). For small matrices like this it is practical and I can not imagine anything else written in MATLAB that could beat this performance-wise. Well, for very large number of 2x2 matrices you could introduce blocking (i.e., handle only a number of matrices at a time) to enhance cache reuse.
I would say that the cycle here is not that bad and not that slow, consider this
N = 1000000
S = cell(1,N);
Out = S;
A = rand(2);
B = rand(2);
for i = 1 : N
S{i} = rand(2);
end
tic
for i = 1 : N
Out{i} = A * S{i} * B;
end
toc
tic
f = #(i) A*i*B;
Out = cellfun(f,S,'UniformOutput' , false);
toc
N =
1000000
Elapsed time is 2.609569 seconds.
Elapsed time is 9.871200 seconds.
You may think of performing a cat of your 2x2 matrices and then performing just 2 multiplications (transposing correctly on the way). But you will loose time in catting.

What is a fast way to compute column by column correlation in matlab

I have two very large matrices (60x25000) and I'd like to compute the correlation between the columns only between the two matrices. For example:
corrVal(1) = corr(mat1(:,1), mat2(:,1);
corrVal(2) = corr(mat1(:,2), mat2(:,2);
...
corrVal(i) = corr(mat1(:,i), mat2(:,i);
For smaller matrices I can simply use:
colCorr = diag( corr( mat1, mat2 ) );
but this doesn't work for very large matrices as I run out of memory. I've considered slicing up the matrices to compute the correlations and then combining the results but it seems like a waste to compute correlation between column combinations that I'm not actually interested.
Is there a quick way to directly compute what I'm interested?
Edit: I've used a loop in the past but its just way to slow:
mat1 = rand(60,5000);
mat2 = rand(60,5000);
nCol = size(mat1,2);
corrVal = zeros(nCol,1);
tic;
for i = 1:nCol
corrVal(i) = corr(mat1(:,i), mat2(:,i));
end
toc;
This takes ~1 second
tic;
corrVal = diag(corr(mat1,mat2));
toc;
This takes ~0.2 seconds
I can obtain a x100 speed improvement by computing it by hand.
An=bsxfun(#minus,A,mean(A,1)); %%% zero-mean
Bn=bsxfun(#minus,B,mean(B,1)); %%% zero-mean
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1))); %% L2-normalization
Bn=bsxfun(#times,Bn,1./sqrt(sum(Bn.^2,1))); %% L2-normalization
C=sum(An.*Bn,1); %% correlation
You can compare using that code:
A=rand(60,25000);
B=rand(60,25000);
tic;
C=zeros(1,size(A,2));
for i = 1:size(A,2)
C(i)=corr(A(:,i), B(:,i));
end
toc;
tic
An=bsxfun(#minus,A,mean(A,1));
Bn=bsxfun(#minus,B,mean(B,1));
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1)));
Bn=bsxfun(#times,Bn,1./sqrt(sum(Bn.^2,1)));
C2=sum(An.*Bn,1);
toc
mean(abs(C-C2)) %% difference between methods
Here are the computing times:
Elapsed time is 10.822766 seconds.
Elapsed time is 0.119731 seconds.
The difference between the two results is very small:
mean(abs(C-C2))
ans =
3.0968e-17
EDIT: explanation
bsxfun does a column-by-column operation (or row-by-row depending on the input).
An=bsxfun(#minus,A,mean(A,1));
This line will remove (#minus) the mean of each column (mean(A,1)) to each column of A... So basically it makes the columns of A zero-mean.
An=bsxfun(#times,An,1./sqrt(sum(An.^2,1)));
This line multiply (#times) each column by the inverse of its norm. So it makes them L-2 normalized.
Once the columns are zero-mean and L2-normalized, to compute the correlation, you just have to make the dot product of each column of An with each column of B. So you multiply them element-wise An.*Bn, and then you sum each column: sum(An.*Bn);.
I think the obvious loop might be good enough for your size of problem. On my laptop it takes less than 6 seconds to do the following:
A = rand(60,25000);
B = rand(60,25000);
n = size(A,1);
m = size(A,2);
corrVal = zeros(1,m);
for k=1:m
corrVal(k) = corr(A(:,k),B(:,k));
end