Vectorization when mapping between indices in an assignment is not injective - matlab

Suppose that c is a scalar value, T and W are M-by-N matrices, k is another M-by-N matrix containing values from 1 to M (and there are at least two pairs (i1, j1), (i2, j2) such that k(i1, j1)==k(i2, j2)) and a is a 1-by-M vector. I want to vectorize the following code (hoping that this will speed it up):
T = zeros(M,N);
for j = 1:N
for i = 1:M
T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
end
end
Do you have any tips so that I can vectorize this code (or make it faster in general)?
Thanks in advance!

Since k only ever effects how values are aggregated within a column, but not between columns, you can achieve a slight speedup by reducing the problem to a single loop over columns and using accumarray like so:
T = zeros(M, N);
for col = 1:N
T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end
I tested each of the solutions (the loop in your question, rahnema's, Divakar's, and mine) by taking the average of 100 iterations using input values initialized as in Divakar's answer. Here's what I got (running Windows 7 x64, 16 GB RAM, MATLAB R2016b):
solution | avg. time (s) | max(abs(err))
---------+---------------+---------------
loop | 0.12461 | 0
rahnema | 0.84518 | 0
divakar | 0.12381 | 1.819e-12
gnovice | 0.09477 | 0
The take-away: loops actually aren't so bad, but if you can simplify them into one it can save you a little time.

Here's an approach with a combination of bsxfun and accumarray -
% Create 2D array of unique IDs along each col to be used as flattened subs
id = bsxfun(#plus,k,M*(0:N-1));
% Compute "c*W(i,j)/a(i)" for all i's and j's
cWa = c*bsxfun(#rdivide,W,a);
% Accumulate final result for all cols
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);
Benchmarking
Approaches as functions -
function out = func1(W,a,c,k,M,N)
id = bsxfun(#plus,k,M*(0:N-1));
cWa = c*bsxfun(#rdivide,W,a);
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);
function T = func2(W,a,c,k,M,N) % #rahnema1's solution
[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([k(idx1(:)) ,J(:)], R(:),[M N]);
function T = func3(W,a,c,k,M,N) % Original approach
T = zeros(M,N);
for j = 1:N
for i = 1:M
T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
end
end
function T = func4(W,a,c,k,M,N) % #gnovice's solution
T = zeros(M, N);
for col = 1:N
T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end
Machine setup : Kubuntu 16.04, MATLAB 2012a, 4GB RAM.
Timing code -
% Setup inputs
M = 3000;
N = 3000;
W = rand(M,N);
a = rand(M,1);
c = 2.34;
k = randi([1,M],[M,N]);
disp('------------------ With func1')
tic,out = func1(W,a,c,k,M,N);toc
clear out
disp('------------------ With func2')
tic,out = func2(W,a,c,k,M,N);toc
clear out
disp('------------------ With func3')
tic,out = func3(W,a,c,k,M,N);toc
clear out
disp('------------------ With func4')
tic,out = func4(W,a,c,k,M,N);toc
Timing code run -
------------------ With func1
Elapsed time is 0.215591 seconds.
------------------ With func2
Elapsed time is 1.555373 seconds.
------------------ With func3
Elapsed time is 0.572668 seconds.
------------------ With func4
Elapsed time is 0.291552 seconds.
Possible improvements in proposed approach
1] In c*bsxfun(#rdivide,W,a), we are use two stages of broadcasting - One at bsxfun(#rdivide,W,a), where a is broadcasted ; Second one when c is broadcasted to match-up against the 2D output of bsxfun(#rdivide,W,a), though we don't need bsxfun for this one. So, a possible improvement would be if we insert-in c to be divided by a, where c would be only broadcasted to 1D, instead of 2D and then the second level of broadcasting would be1D: c/a to 2D : W just like before. This minor improvement could be timed -
>> tic, c*bsxfun(#rdivide,W,a); toc
Elapsed time is 0.073244 seconds.
>> tic, bsxfun(#times,W,c/a); toc
Elapsed time is 0.041745 seconds.
But, in cases where c and a differ by a lot, the scaling factor c/a would affect the final result by appreciably. So, one need to be careful with this suggestion.

A possible solution:
[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([K(idx1(:)) ,J(:)], R(:),[M N]);
Comparison of different methods in Octave without jit:
------------------ Divakar
Elapsed time is 0.282008 seconds.
------------------ rahnema1
Elapsed time is 1.08827 seconds.
------------------ gnovice
Elapsed time is 0.418701 seconds.
------------------ loop
doesn't complete in 15 seconds.

Related

Computational complexity of expanding a vector using its index in MATLAB

suppose I have a n-by-1 vector A, and a m-by-1 all integer vector b, where max(b)<=n, min(b)>0. Can anyone tell me what is the computational complexity (in big-O notation) for performing command A(b) in MATLAB?
As I found and get a test from A with n = 100, 1000, 10^4, ..., 10^7 and m = 100 which A and b generated randomly, average time on the same machine for all case of n are the same (also, approximately, all times are the same, and equail to 5.00679e-05 for A(b)). Part of the code is like below (you should be aware that to get the more accurate, you should run in 100 times and get an average. Also, get the variance to see that values are not so different):
A = randi(100,100,1);
b = randi(100,100,1);
tic; A(b); toc
> Elapsed time is 5.38826e-05 seconds.
A = randi(100,1000,1);
b = randi(100,100,1);
tic; A(b); toc
> Elapsed time is 6.31809e-05 seconds.
A = randi(100,10000,1);
b = randi(100,100,1);
tic; A(b); toc
> Elapsed time is 4.88758e-05 seconds.
...
% Also, as you can see the range of the b is growth, base on the size of A
% However, I can't see any changes in the time complexity scale.
A = randi(100,1000000,1);
b = randi(1000000,100,1);
tic; A(b); toc
> Elapsed time is 6.60419e-05 seconds.
6.60419e-05
Also, run test over n = 100 and m = 10000, 10^4, ..., 10^7, and on each run I got 10^-5, 10^-4, and so on. In this way, run this on different n, and I got the same result. In the below as the same as the previous part, you can follow the following:
A = randi(100,100,1);
b = randi(100,10000,1);
tic;A(b);toc
> Elapsed time is 7.00951e-05 seconds.
A = randi(100,100,1);
b = randi(100,100000,1);
tic;A(b);toc
> Elapsed time is 0.000529051 seconds.
A = randi(100,100,1);
b = randi(100,1000000,1);
tic;A(b);toc
> Elapsed time is 0.00533104 seconds.
...
For the case of be more aacurate, you can run the code like the following:
a = zeros(100,1);
for idx = 1:100
A = randi(100,100,1);b = randi(100,100000,1);
tic; A(b);a(idx) = toc;
end
var(a)
> ans = 4.4092e-10 % very little
mean(a)
> ans = 3.6702e-04 % 10^-4 scale
As you can see, the variance is very little, and mean scale is 10^-4. Hence, you can approve the others by the same method.
Therefore, base on the above analysis and common index methods, We can say the time complexity is not dependent on n and dependent on size of the b. In sum, the time complexity is O(m).
The complexity is 'unusual'
I came up with a short test script, and ran it a few times. If the size of B increases, there is a clear increase in the required time. However, the results get confusing if you look at the size of A.
R = zeros(3,4);
for m = 10.^[4 5 6]
mc = mc+1;
nc =0;
for n = 10.^[1 2 3 4]
nc = nc+1;
A = rand(m,1);
b = randi(m, n, 1);
tic; A(b); R(mc,nc) = toc*10^5;
end
end
R
The above script gave the following results (also similar results when repeated). Note that this was in Octave Online, rather than Matlab.
R =
3.2902 1.7881 2.3127 9.3937
5.3167 2.0027 2.8133 8.2970
21.6007 3.3140 4.9829 17.3092

Vectorizing matrix multiplication inside a tensor

I'm having some trouble vectorizing a part of my code. I have a (n,n,m) tensor, and I want to multiply each slice in m by a second (n by n) matrix (NOT element wise).
Here's what it looks like as a for-loop:
Tensor=zeros(2,2,3);
Matrix = [1,2; 3,4];
for j=1:n
Matrices_Multiplied = Tensor(:,:,j)*Matrix;
Recursive_Matrix=Recursive_Matrix + Tensor(:,:,j)/trace(Matrices_Multiplied);
end
How do I perform matrix multiplication for individual matrices inside a tensor in a vectorized manner? Is there a built-in function like tensor-dot that can handle this or is it more clever?
Bsxfunning and using efficient matrix-multiplication, we could do -
% Calculate trace values using matrix-multiplication
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
% Use broadcasting to perform elementwise division across all slices
out = sum(bsxfun(#rdivide,Tensor,reshape(T,1,1,[])),3);
Again, one can replace the last step with one more matrix-multiplication for possible further boost in performance. Thus, an all matrix-multiplication dedicated solution would be -
[m,n,r] = size(Tensor);
out = reshape(reshape(Tensor,[],size(Tensor,3))*(1./T.'),m,n)
Runtime test
Benchmarking code -
% Input arrays
n = 100; m = 100;
Tensor=rand(n,n,m);
Matrix=rand(n,n);
num_iter = 100; % Number of iterations to be run for
tic
disp('------------ Loopy woopy doops : ')
for iter = 1:num_iter
Recursive_Matrix = zeros(n,n);
for j=1:n
Matrices_Multiplied = Tensor(:,:,j)*Matrix;
Recursive_Matrix=Recursive_Matrix+Tensor(:,:,j)/trace(Matrices_Multiplied);
end
end
toc, clear iter Recursive_Matrix Matrices_Multiplied
tic
disp('------------- Bsxfun matrix-mul not so dull : ')
for iter = 1:num_iter
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
out = sum(bsxfun(#rdivide,Tensor,reshape(T,1,1,[])),3);
end
toc, clear T out
tic
disp('-------------- All matrix-mul having a ball : ')
for iter = 1:num_iter
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
[m,n,r] = size(Tensor);
out = reshape(reshape(Tensor,[],size(Tensor,3))*(1./T.'),m,n);
end
toc
Timings -
------------ Loopy woopy doops :
Elapsed time is 3.339464 seconds.
------------- Bsxfun matrix-mul not so dull :
Elapsed time is 1.354137 seconds.
-------------- All matrix-mul having a ball :
Elapsed time is 0.373712 seconds.

Multidimensional version of "kron" product?

Now I have a matrix A of dimension N by p, and the other matrix B of dimension N by q. What I want to have is a matrix, say C, of dimension N by pq such that
C(i,:) = kron(A(i,:), B(i,:));
If N is large, loop over N rows may take quite long time. So currently I am augmenting A and B appropriately(combining usage of repmat, permute and reshape) to turn each matrix of dimension N by pq, and then formulating C by something like
C = A_aug .* B_aug;
Any better idea?
Checkout some bsxfun + permute + reshape magic -
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[])
Benchmarking & Verification
Benchmarking code -
%// Setup inputs
N = 200;
p = 190;
q = 180;
A = rand(N,p);
B = rand(N,q);
disp('--------------------------------------- Without magic')
tic
C = zeros(size(A,1),size(A,2)*size(B,2));
for i = 1:size(A,1)
C(i,:) = kron(A(i,:), B(i,:));
end
toc
disp('--------------------------------------- With some magic')
tic
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[]);
toc
error_val = max(abs(C(:)-out(:)))
Output -
--------------------------------------- Without magic
Elapsed time is 0.524396 seconds.
--------------------------------------- With some magic
Elapsed time is 0.055082 seconds.
error_val =
0

how to sum the elements specified by a cell in matlab?

I have a big matrix M (nxm). I am going to sum some elements which are specified by index stored in vector as cell elements. There are many groups of indices so the cell has more than one element. For example
M = rand(2103, 2030);
index{1} = [1 3 2 4 53 5 23 3];
index{2} = [2 3 1 3 23 10234 2032];
% ...
index{2032} = ...;
I am going to sum up all elements at index{1}, sum up all elements at index{2} ..., now I am using a loop
sums = zeros(1, 2032);
for n=1:2032
sums(n) = sum(M(index{n}));
end
I am wondering if there is any way to use one-line command instead of a loop to do that. Using a loop is pretty slow.
Probably a classic use of cellfun
sums = cellfun(#(idx) sum(M(idx)), index);
EDIT: here is a benchmarking for a large case that shows that this approach is slightly slower than a for loop but faster than Eitan T's method
M = rand(2103, 2030);
index = cell(1, 2032);
index{1} = [1 3 2 4 53 5 23 3];
index{2} = [2 3 1 3 23 10234 2032];
for n=3:2032
index{n} = randi(numel(M), 1, randi(10000));
end
N = 1e1;
sums = zeros(1, 2032);
tic
for kk = 1:N
for n=1:2032
sums(n) = sum(M(index{n}));
end
end
toc
tic
for kk = 1:N
sums = cellfun(#(idx) sum(M(idx)), index);
end
toc
tic
for kk = 1:N
sums = cumsum(M([index{:}]));
sums = diff([0, sums(cumsum(cellfun('length', index)))]);
end
toc
results in
Elapsed time is 2.072292 seconds.
Elapsed time is 2.139882 seconds.
Elapsed time is 2.669894 seconds.
Perhaps not as elegant as a cellfun one-liner, but runs more than an order of magnitude faster:
sums = cumsum(M([index{:}]));
sums = diff([0, sums(cumsum(cellfun('length', index)))]);
It even runs approximately 4 or 5 times faster than a JIT-accelerated loop for large inputs. Note that when each cell in index contains a vector with more than ~2000 elements, the performance of this approach begins to deteriorate in comparison with a loop (and cellfun).
Benchmark
M = rand(2103, 2030);
I = ceil(numel(M) * rand(2032, 10));
index = mat2cell(I, ones(size(I, 1), 1), size(I, 2));
N = 100;
tic
for k = 1:N
sums = zeros(1, numel(index));
for n = 1:numel(sums)
sums(n) = sum(M(index{n}));
end
end
toc
tic
for k = 1:N
sums = cellfun(#(idx) sum(M(idx)), index);
end
toc
tic
for k = 1:N
sums = cumsum(M([index{:}]));
sums2 = diff([0, sums(cumsum(cellfun('length', index)))]);
end
toc
When executing this in MATLAB 2012a (Windows Server 2008 R2 running on a 2.27GHz 16-core Intel Xeon processor), I got:
Elapsed time is 0.579783 seconds.
Elapsed time is 1.789809 seconds.
Elapsed time is 0.111455 seconds.

Multiply a 3D matrix with a 2D matrix

Suppose I have an AxBxC matrix X and a BxD matrix Y.
Is there a non-loop method by which I can multiply each of the C AxB matrices with Y?
As a personal preference, I like my code to be as succinct and readable as possible.
Here's what I would have done, though it doesn't meet your 'no-loops' requirement:
for m = 1:C
Z(:,:,m) = X(:,:,m)*Y;
end
This results in an A x D x C matrix Z.
And of course, you can always pre-allocate Z to speed things up by using Z = zeros(A,D,C);.
You can do this in one line using the functions NUM2CELL to break the matrix X into a cell array and CELLFUN to operate across the cells:
Z = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
The result Z is a 1-by-C cell array where each cell contains an A-by-D matrix. If you want Z to be an A-by-D-by-C matrix, you can use the CAT function:
Z = cat(3,Z{:});
NOTE: My old solution used MAT2CELL instead of NUM2CELL, which wasn't as succinct:
[A,B,C] = size(X);
Z = cellfun(#(x) x*Y,mat2cell(X,A,B,ones(1,C)),'UniformOutput',false);
Here's a one-line solution (two if you want to split into 3rd dimension):
A = 2;
B = 3;
C = 4;
D = 5;
X = rand(A,B,C);
Y = rand(B,D);
%# calculate result in one big matrix
Z = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
%'# split into third dimension
Z = permute(reshape(Z',[D A C]),[2 1 3]);
Hence now: Z(:,:,i) contains the result of X(:,:,i) * Y
Explanation:
The above may look confusing, but the idea is simple.
First I start by take the third dimension of X and do a vertical concatenation along the first dim:
XX = cat(1, X(:,:,1), X(:,:,2), ..., X(:,:,C))
... the difficulty was that C is a variable, hence you can't generalize that expression using cat or vertcat. Next we multiply this by Y:
ZZ = XX * Y;
Finally I split it back into the third dimension:
Z(:,:,1) = ZZ(1:2, :);
Z(:,:,2) = ZZ(3:4, :);
Z(:,:,3) = ZZ(5:6, :);
Z(:,:,4) = ZZ(7:8, :);
So you can see it only requires one matrix multiplication, but you have to reshape the matrix before and after.
I'm approaching the exact same issue, with an eye for the most efficient method. There are roughly three approaches that i see around, short of using outside libraries (i.e., mtimesx):
Loop through slices of the 3D matrix
repmat-and-permute wizardry
cellfun multiplication
I recently compared all three methods to see which was quickest. My intuition was that (2) would be the winner. Here's the code:
% generate data
A = 20;
B = 30;
C = 40;
D = 50;
X = rand(A,B,C);
Y = rand(B,D);
% ------ Approach 1: Loop (via #Zaid)
tic
Z1 = zeros(A,D,C);
for m = 1:C
Z1(:,:,m) = X(:,:,m)*Y;
end
toc
% ------ Approach 2: Reshape+Permute (via #Amro)
tic
Z2 = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
Z2 = permute(reshape(Z2',[D A C]),[2 1 3]);
toc
% ------ Approach 3: cellfun (via #gnovice)
tic
Z3 = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
Z3 = cat(3,Z3{:});
toc
All three approaches produced the same output (phew!), but, surprisingly, the loop was the fastest:
Elapsed time is 0.000418 seconds.
Elapsed time is 0.000887 seconds.
Elapsed time is 0.001841 seconds.
Note that the times can vary quite a lot from one trial to another, and sometimes (2) comes out the slowest. These differences become more dramatic with larger data. But with much bigger data, (3) beats (2). The loop method is still best.
% pretty big data...
A = 200;
B = 300;
C = 400;
D = 500;
Elapsed time is 0.373831 seconds.
Elapsed time is 0.638041 seconds.
Elapsed time is 0.724581 seconds.
% even bigger....
A = 200;
B = 200;
C = 400;
D = 5000;
Elapsed time is 4.314076 seconds.
Elapsed time is 11.553289 seconds.
Elapsed time is 5.233725 seconds.
But the loop method can be slower than (2), if the looped dimension is much larger than the others.
A = 2;
B = 3;
C = 400000;
D = 5;
Elapsed time is 0.780933 seconds.
Elapsed time is 0.073189 seconds.
Elapsed time is 2.590697 seconds.
So (2) wins by a big factor, in this (maybe extreme) case. There may not be an approach that is optimal in all cases, but the loop is still pretty good, and best in many cases. It is also best in terms of readability. Loop away!
Nope. There are several ways, but it always comes out in a loop, direct or indirect.
Just to please my curiosity, why would you want that anyway ?
To answer the question, and for readability, please see:
ndmult, by ajuanpi (Juan Pablo Carbajal), 2013, GNU GPL
Input
2 arrays
dim
Example
nT = 100;
t = 2*pi*linspace (0,1,nT)’;
# 2 experiments measuring 3 signals at nT timestamps
signals = zeros(nT,3,2);
signals(:,:,1) = [sin(2*t) cos(2*t) sin(4*t).^2];
signals(:,:,2) = [sin(2*t+pi/4) cos(2*t+pi/4) sin(4*t+pi/6).^2];
sT(:,:,1) = signals(:,:,1)’;
sT(:,:,2) = signals(:,:,2)’;
G = ndmult (signals,sT,[1 2]);
Source
Original source. I added inline comments.
function M = ndmult (A,B,dim)
dA = dim(1);
dB = dim(2);
# reshape A into 2d
sA = size (A);
nA = length (sA);
perA = [1:(dA-1) (dA+1):(nA-1) nA dA](1:nA);
Ap = permute (A, perA);
Ap = reshape (Ap, prod (sA(perA(1:end-1))), sA(perA(end)));
# reshape B into 2d
sB = size (B);
nB = length (sB);
perB = [dB 1:(dB-1) (dB+1):(nB-1) nB](1:nB);
Bp = permute (B, perB);
Bp = reshape (Bp, sB(perB(1)), prod (sB(perB(2:end))));
# multiply
M = Ap * Bp;
# reshape back to original format
s = [sA(perA(1:end-1)) sB(perB(2:end))];
M = squeeze (reshape (M, s));
endfunction
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',X,Y);
here is a benchmark for all possible methods. For more detail refer to this question.
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I would like to share my answer to the problems of:
1) making the tensor product of two tensors (of any valence);
2) making the contraction of two tensors along any dimension.
Here are my subroutines for the first and second tasks:
1) tensor product:
function [C] = tensor(A,B)
C = squeeze( reshape( repmat(A(:), 1, numel(B)).*B(:).' , [size(A),size(B)] ) );
end
2) contraction:
Here A and B are the tensors to be contracted along the dimesions i and j respectively. The lengths of these dimensions should be equal, of course. There's no check for this (this would obscure the code) but apart from this it works well.
function [C] = tensorcontraction(A,B, i,j)
sa = size(A);
La = length(sa);
ia = 1:La;
ia(i) = [];
ia = [ia i];
sb = size(B);
Lb = length(sb);
ib = 1:Lb;
ib(j) = [];
ib = [j ib];
% making the i-th dimension the last in A
A1 = permute(A, ia);
% making the j-th dimension the first in B
B1 = permute(B, ib);
% making both A and B 2D-matrices to make use of the
% matrix multiplication along the second dimension of A
% and the first dimension of B
A2 = reshape(A1, [],sa(i));
B2 = reshape(B1, sb(j),[]);
% here's the implicit implication that sa(i) == sb(j),
% otherwise - crash
C2 = A2*B2;
% back to the original shape with the exception
% of dimensions along which we've just contracted
sa(i) = [];
sb(j) = [];
C = squeeze( reshape( C2, [sa,sb] ) );
end
Any critics?
I would think recursion, but that's the only other non- loop method you can do
You could "unroll" the loop, ie write out all the multiplications sequentially that would occur in the loop