Vectorizing matrix multiplication inside a tensor - matlab

I'm having some trouble vectorizing a part of my code. I have a (n,n,m) tensor, and I want to multiply each slice in m by a second (n by n) matrix (NOT element wise).
Here's what it looks like as a for-loop:
Tensor=zeros(2,2,3);
Matrix = [1,2; 3,4];
for j=1:n
Matrices_Multiplied = Tensor(:,:,j)*Matrix;
Recursive_Matrix=Recursive_Matrix + Tensor(:,:,j)/trace(Matrices_Multiplied);
end
How do I perform matrix multiplication for individual matrices inside a tensor in a vectorized manner? Is there a built-in function like tensor-dot that can handle this or is it more clever?

Bsxfunning and using efficient matrix-multiplication, we could do -
% Calculate trace values using matrix-multiplication
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
% Use broadcasting to perform elementwise division across all slices
out = sum(bsxfun(#rdivide,Tensor,reshape(T,1,1,[])),3);
Again, one can replace the last step with one more matrix-multiplication for possible further boost in performance. Thus, an all matrix-multiplication dedicated solution would be -
[m,n,r] = size(Tensor);
out = reshape(reshape(Tensor,[],size(Tensor,3))*(1./T.'),m,n)
Runtime test
Benchmarking code -
% Input arrays
n = 100; m = 100;
Tensor=rand(n,n,m);
Matrix=rand(n,n);
num_iter = 100; % Number of iterations to be run for
tic
disp('------------ Loopy woopy doops : ')
for iter = 1:num_iter
Recursive_Matrix = zeros(n,n);
for j=1:n
Matrices_Multiplied = Tensor(:,:,j)*Matrix;
Recursive_Matrix=Recursive_Matrix+Tensor(:,:,j)/trace(Matrices_Multiplied);
end
end
toc, clear iter Recursive_Matrix Matrices_Multiplied
tic
disp('------------- Bsxfun matrix-mul not so dull : ')
for iter = 1:num_iter
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
out = sum(bsxfun(#rdivide,Tensor,reshape(T,1,1,[])),3);
end
toc, clear T out
tic
disp('-------------- All matrix-mul having a ball : ')
for iter = 1:num_iter
T = reshape(Matrix.',1,[])*reshape(Tensor,[],size(Tensor,3));
[m,n,r] = size(Tensor);
out = reshape(reshape(Tensor,[],size(Tensor,3))*(1./T.'),m,n);
end
toc
Timings -
------------ Loopy woopy doops :
Elapsed time is 3.339464 seconds.
------------- Bsxfun matrix-mul not so dull :
Elapsed time is 1.354137 seconds.
-------------- All matrix-mul having a ball :
Elapsed time is 0.373712 seconds.

Related

How to speed up iterative function call in MatLab?

In MatLab I have to call the cdf of the t distribution (tcdf) iteratively (since the next input value depends on the previous output of tcdf), which unfortunately slows down my code massively.
tic
z = NaN(1e5,1);
z(1) = 1;
x = 2;
for ii = 2:1e5
x = tcdf(z(ii-1),x);
z(ii) = z(ii-1)*x;
end
toc
Elapsed time is 4.717087 seconds.
Is there a way to speed this up somehow?
For comparison:
tic
z = randn(1e5,1);
tcdf(z,5);
toc
Elapsed time is 0.091353 seconds.
Move the random number generation outside the loop as suggested below
numVal = 1e5
z = randn(numVal,1);
for ii = 2:numVal
z(ii) = z(ii-1) + z(ii);
end
tcdf(z,5);

Convert point cloud to voxels via averaging

I have the following data:
N = 10^3;
x = randn(N,1);
y = randn(N,1);
z = randn(N,1);
f = x.^2+y.^2+z.^2;
Now I want to split this continuous 3D space into nB bins.
nB = 20;
[~,~,x_bins] = histcounts(x,nB);
[~,~,y_bins] = histcounts(y,nB);
[~,~,z_bins] = histcounts(z,nB);
And put in each cube average f or nan if no observations happen in the cube:
F = nan(50,50,50);
for iX = 1:20
for iY = 1:20
for iZ = 1:20
idx = (x_bins==iX)&(y_bins==iY)&(z_bins==iZ);
F(iX,iY,iZ) = mean(f(idx));
end
end
end
isosurface(F,0.5)
This code does what I want. My problem is the speed. This code is extremely slow when N > 10^5 and nB = 100.
How can I speed up this code?
I also tried the accumarray() function:
subs=([x_bins,y_bins,z_bins]);
F2 = accumarray(subs,f,[],#mean);
all(F(:) == F2(:)) % false
However, this code produces a different result.
The problem with the code in the OP is that it tests all elements of the data for each element in the output array. The output array has nB^3 elements, the data has N elements, so the algorithm is O(N*nB^3). Instead, one can loop over the N elements of the input, and set the corresponding element in the output array, which is an operation O(N) (2nd code block below).
The accumarray solution in the OP needs to use the fillvals parameter, set it to NaN (3rd code block below).
To compare the results, one needs to explicitly test that both arrays have NaN in the same locations, and have equal non-NaN values elsewhere:
all( ( isnan(F(:)) & isnan(F2(:)) ) | ( F(:) == F2(:) ) )
% \-------same NaN values------/ \--same values--/
Here is code. All three versions produce identical results. Timings in Octave 4.4.1 (no JIT), in MATLAB the loop code should be faster. (Using input data from OP, with N=10^3 and nB=20).
%% OP's code, O(N*nB^3)
tic
F = nan(nB,nB,nB);
for iX = 1:nB
for iY = 1:nB
for iZ = 1:nB
idx = (x_bins==iX)&(y_bins==iY)&(z_bins==iZ);
F(iX,iY,iZ) = mean(f(idx));
end
end
end
toc
% Elapsed time is 1.61736 seconds.
%% Looping over input, O(N)
tic
s = zeros(nB,nB,nB);
c = zeros(nB,nB,nB);
ind = sub2ind([nB,nB,nB],x_bins,y_bins,z_bins);
for ii=1:N
s(ind(ii)) = s(ind(ii)) + f(ii);
c(ind(ii)) = c(ind(ii)) + 1;
end
F2 = s ./ c;
toc
% Elapsed time is 0.0606539 seconds.
%% Other alternative, using accumarray
tic
ind = sub2ind([nB,nB,nB],x_bins,y_bins,z_bins);
F3 = accumarray(ind,f,[nB,nB,nB],#mean,NaN);
toc
% Elapsed time is 0.14113 seconds.

Vectorization when mapping between indices in an assignment is not injective

Suppose that c is a scalar value, T and W are M-by-N matrices, k is another M-by-N matrix containing values from 1 to M (and there are at least two pairs (i1, j1), (i2, j2) such that k(i1, j1)==k(i2, j2)) and a is a 1-by-M vector. I want to vectorize the following code (hoping that this will speed it up):
T = zeros(M,N);
for j = 1:N
for i = 1:M
T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
end
end
Do you have any tips so that I can vectorize this code (or make it faster in general)?
Thanks in advance!
Since k only ever effects how values are aggregated within a column, but not between columns, you can achieve a slight speedup by reducing the problem to a single loop over columns and using accumarray like so:
T = zeros(M, N);
for col = 1:N
T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end
I tested each of the solutions (the loop in your question, rahnema's, Divakar's, and mine) by taking the average of 100 iterations using input values initialized as in Divakar's answer. Here's what I got (running Windows 7 x64, 16 GB RAM, MATLAB R2016b):
solution | avg. time (s) | max(abs(err))
---------+---------------+---------------
loop | 0.12461 | 0
rahnema | 0.84518 | 0
divakar | 0.12381 | 1.819e-12
gnovice | 0.09477 | 0
The take-away: loops actually aren't so bad, but if you can simplify them into one it can save you a little time.
Here's an approach with a combination of bsxfun and accumarray -
% Create 2D array of unique IDs along each col to be used as flattened subs
id = bsxfun(#plus,k,M*(0:N-1));
% Compute "c*W(i,j)/a(i)" for all i's and j's
cWa = c*bsxfun(#rdivide,W,a);
% Accumulate final result for all cols
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);
Benchmarking
Approaches as functions -
function out = func1(W,a,c,k,M,N)
id = bsxfun(#plus,k,M*(0:N-1));
cWa = c*bsxfun(#rdivide,W,a);
out = reshape(accumarray(id(:),reshape(cWa,[],1),[M*N 1]),[M,N]);
function T = func2(W,a,c,k,M,N) % #rahnema1's solution
[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([k(idx1(:)) ,J(:)], R(:),[M N]);
function T = func3(W,a,c,k,M,N) % Original approach
T = zeros(M,N);
for j = 1:N
for i = 1:M
T(k(i,j),j) = T(k(i,j),j) + c*W(i,j)/a(i);
end
end
function T = func4(W,a,c,k,M,N) % #gnovice's solution
T = zeros(M, N);
for col = 1:N
T(:, col) = accumarray(k(:,col), c*W(:, col)./a, [M 1]);
end
Machine setup : Kubuntu 16.04, MATLAB 2012a, 4GB RAM.
Timing code -
% Setup inputs
M = 3000;
N = 3000;
W = rand(M,N);
a = rand(M,1);
c = 2.34;
k = randi([1,M],[M,N]);
disp('------------------ With func1')
tic,out = func1(W,a,c,k,M,N);toc
clear out
disp('------------------ With func2')
tic,out = func2(W,a,c,k,M,N);toc
clear out
disp('------------------ With func3')
tic,out = func3(W,a,c,k,M,N);toc
clear out
disp('------------------ With func4')
tic,out = func4(W,a,c,k,M,N);toc
Timing code run -
------------------ With func1
Elapsed time is 0.215591 seconds.
------------------ With func2
Elapsed time is 1.555373 seconds.
------------------ With func3
Elapsed time is 0.572668 seconds.
------------------ With func4
Elapsed time is 0.291552 seconds.
Possible improvements in proposed approach
1] In c*bsxfun(#rdivide,W,a), we are use two stages of broadcasting - One at bsxfun(#rdivide,W,a), where a is broadcasted ; Second one when c is broadcasted to match-up against the 2D output of bsxfun(#rdivide,W,a), though we don't need bsxfun for this one. So, a possible improvement would be if we insert-in c to be divided by a, where c would be only broadcasted to 1D, instead of 2D and then the second level of broadcasting would be1D: c/a to 2D : W just like before. This minor improvement could be timed -
>> tic, c*bsxfun(#rdivide,W,a); toc
Elapsed time is 0.073244 seconds.
>> tic, bsxfun(#times,W,c/a); toc
Elapsed time is 0.041745 seconds.
But, in cases where c and a differ by a lot, the scaling factor c/a would affect the final result by appreciably. So, one need to be careful with this suggestion.
A possible solution:
[I J] = meshgrid(1:M,1:N);
idx1 = sub2ind([M N], I ,J);
R = c.* W(idx1) ./ a(I);
T = accumarray([K(idx1(:)) ,J(:)], R(:),[M N]);
Comparison of different methods in Octave without jit:
------------------ Divakar
Elapsed time is 0.282008 seconds.
------------------ rahnema1
Elapsed time is 1.08827 seconds.
------------------ gnovice
Elapsed time is 0.418701 seconds.
------------------ loop
doesn't complete in 15 seconds.

Multidimensional version of "kron" product?

Now I have a matrix A of dimension N by p, and the other matrix B of dimension N by q. What I want to have is a matrix, say C, of dimension N by pq such that
C(i,:) = kron(A(i,:), B(i,:));
If N is large, loop over N rows may take quite long time. So currently I am augmenting A and B appropriately(combining usage of repmat, permute and reshape) to turn each matrix of dimension N by pq, and then formulating C by something like
C = A_aug .* B_aug;
Any better idea?
Checkout some bsxfun + permute + reshape magic -
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[])
Benchmarking & Verification
Benchmarking code -
%// Setup inputs
N = 200;
p = 190;
q = 180;
A = rand(N,p);
B = rand(N,q);
disp('--------------------------------------- Without magic')
tic
C = zeros(size(A,1),size(A,2)*size(B,2));
for i = 1:size(A,1)
C(i,:) = kron(A(i,:), B(i,:));
end
toc
disp('--------------------------------------- With some magic')
tic
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[]);
toc
error_val = max(abs(C(:)-out(:)))
Output -
--------------------------------------- Without magic
Elapsed time is 0.524396 seconds.
--------------------------------------- With some magic
Elapsed time is 0.055082 seconds.
error_val =
0

MATLAB sum series function

I am very new in Matlab. I just try to implement sum of series 1+x+x^2/2!+x^3/3!..... . But I could not find out how to do it. So far I did just sum of numbers. Help please.
for ii = 1:length(a)
sum_a = sum_a + a(ii)
sum_a
end
n = 0 : 10; % elements of the series
x = 2; % value of x
s = sum(x .^ n ./ factorial(n)); % sum
The second part of your answer is:
n = 0:input('variable?')
Cheery's approach is perfectly valid when the number of terms of the series is small. For large values, a faster approach is as follows. This is more efficient because it avoids repeating multiplications:
m = 10;
x = 2;
result = 1+sum(cumprod(x./[1:m]));
Example running time for m = 1000; x = 1;
tic
for k = 1:1e4
result = 1+sum(cumprod(x./[1:m]));
end
toc
tic
for k = 1:1e4
result = sum(x.^(0:m)./factorial(0:m));
end
toc
gives
Elapsed time is 1.572464 seconds.
Elapsed time is 2.999566 seconds.