Matlab: slower execution although less operation - matlab

I'm new to Matlab. This is my playground script:
function speedtest()
a = reshape(1:1:30000, 10000, 3);
tic;
for i = 1:100
a(:, [1, 2]) = bsxfun(#minus, a(:, [1, 2]), [1, 1]);
end
toc
tic;
for i = 1:100
a = bsxfun(#minus, a, [1, 1, 0]);
end
toc
end
And the execution time:
Elapsed time is 0.007709 seconds.
Elapsed time is 0.001803 seconds.
The first method has less operation, but it runs much slower. Is this a vectorization issue? If so, why can't Matlab "vectorize" my a(:, [1, 2]) selection?
Update:
As per #thewaywewalk, I put the code to individual function, remove the loop and use timeit. Here's the result:
# a(:, [1, 2]) = bsxfun(#minus, a(:, [1, 2]), [1, 1]);
1.0064e-04
# a = bsxfun(#minus, a, [1, 1, 0]);
6.4187e-05

the overhead of the first approach came from sub-matrix slicing. changing it to
tic;
b=a(:,[1,2]);
for i = 1:100
b = bsxfun(#minus, b, [1, 1]);
end
a(:,[1,2])=b;
toc
makes it significantly faster

Related

Multiply n vectors of length p by n matrices of size pxp

I have n complex vectors of length p that I want to multiply by n complex matrices of size p-by-p. I am looking for the most efficient way to do this in MATLAB. If it matters, I am imagining that n is large and p is small.
An example using a loop (which I would like to avoid) is shown below.
N = 1e4;
p = 5;
A = randn(p, N); % N vectors of length p
B = randn(p, p, N); % N matrices of size pxp
C = zeros(p, N);
for k = 1:N
C(:, k) = B(:, :, k) * A(:, k);
end
It's been suggested that I might be able to achieve this efficiently using tensor functions, but I haven't been able to figure that out.
Here's a way using implicit expansion:
C = permute(sum(B.*permute(A, [3 1 2]), 2), [1 3 2]);
For old Matlab versions (before R2016b) you need to rewrite it with bsxfun:
C = permute(sum(bsxfun(#times, B, permute(A, [3 1 2])), 2), [1 3 2]);
You can accomplish that in various ways:
A = rand(3, 3, 1e6);
B = rand(3, 1);
tic, C = zeros(3, size(A, 3));
for i = 1:size(A, 3)
C(:,i) = A(:,:,i)*B ;
end, toc
tic; C = reshape(reshape(permute(A,[2,1,3]),3,[]).'*B,3,[]); toc
tic; C = squeeze(sum(bsxfun(#times, A, reshape(B, 1, 3)), 2)); toc
In my system:
Elapsed time is 2.067629 seconds. % Loop
Elapsed time is 0.064164 seconds. % permute
Elapsed time is 0.145738 seconds % sum(times())

Fast multipliction of multiple matrices by multiple vectors

In matlab, I would like to multiply M vectors using L matrices, resulting with M x L new vectors. Specifically, say I have a matrix A of size N x M and a matrix B of size N x N x L matrix, I would like to calculate a matrix C of size N x M x L where the result is exactly like the following slow code:
for m=1:M
for l=1:L
C(:,m,l)=B(:,:,l)*A(:,m)
end
end
but do achieve this efficiently (using native code rather than matlab looping).
We could ab-use fast matrix-multiplication here, just need to rearrange dimensions. So, push back the second dimension of B to the end and reshape to 2D such that the first two dims are merged. Perform matrix-multiplication with A to give us a 2D array. Let's call it C. Now, C's first dim were the merged dims from B. So, split it back into their original two dim lengths with reshaping resulting in a 3D array. Finally push back the second dim to the back with one more permute. This is the desired 3D output.
Hence, the implementation would be -
permute(reshape(reshape(permute(B,[1,3,2]),[],N)*A,N,L,[]),[1,3,2])
Benchmarking
Benchmarking code :
% Setup inputs
M = 150;
L = 150;
N = 150;
A = randn(N,M);
B = randn(N,N,L);
disp('----------------------- ORIGINAL LOOPY -------------------')
tic
C_loop = NaN(N,M,L);
for m=1:M
for l=1:L
C_loop(:,m,l)=B(:,:,l)*A(:,m);
end
end
toc
disp('----------------------- BSXFUN + PERMUTE -----------------')
% #Luis's soln
tic
C = permute(sum(bsxfun(#times, permute(B, [1 2 4 3]), ...
permute(A, [3 1 2])), 2), [1 3 4 2]);
toc
disp('----------------------- BSXFUN + MATRIX-MULT -------------')
% Propose in this post
tic
out = permute(reshape(reshape(permute(B,[1,3,2]),[],N)*A,N,L,[]),[1,3,2]);
toc
Timings :
----------------------- ORIGINAL LOOPY -------------------
Elapsed time is 0.905811 seconds.
----------------------- BSXFUN + PERMUTE -----------------
Elapsed time is 0.883616 seconds.
----------------------- BSXFUN + MATRIX-MULT -------------
Elapsed time is 0.045331 seconds.
You can do it with some permuting of dimensions and singleton expansion:
C = permute(sum(bsxfun(#times, permute(B, [1 2 4 3]), permute(A, [3 1 2])), 2), [1 3 4 2]);
Check:
% Example inputs:
M = 5;
L = 6;
N = 7;
A = randn(N,M);
B = randn(N,N,L);
% Output with bsxfun and permute:
C = permute(sum(bsxfun(#times, permute(B, [1 2 4 3]), permute(A, [3 1 2])), 2), [1 3 4 2]);
% Output with loops:
C_loop = NaN(N,M,L);
for m=1:M
for l=1:L
C_loop(:,m,l)=B(:,:,l)*A(:,m);
end
end
% Maximum relative error. Should be 0, or of the order of eps:
max_error = max(reshape(abs(C./C_loop),[],1)-1)

Efficient way to compute a tensor

Suppose c is a d-dimensional vector. I want to compute the following third-order tensor
where e_i stands for the ith standard basis of the Euclidean space. Is there a efficient way to compute this? I am using the following for-loop and the Kruskal-tensor ktensor to compute it using the tensor toolbox managed by Sandia National Labs:
x=ktensor({c,c,c});
I=eye(d);
for i=1:d
x=x+2*c(i)*ktensor({I(:,i),I(:,i),I(:,i)}
end
for i=1:d
for j=1:d
x=x- c(i)*c(j)*(ktensor({I(:,i),I(:,i),I(:,j)})+ktensor({I(:,i),I(:,j),I(:,i)})+ktensor({I(:,i),I(:,j),I(:,j)}))
end
end
Here is a possibility.
I used an optimization for the second term, as it places values of c along the "diagonal" of the tensor.
For the first term, there isn't much room for optimization, as it is a dense multiplication, so bsxfun seems appropriate.
For the third term, I stick to bsxfun, but as the result is somewhat sparse, you may benefit from filling it "by hand" if the size of your matrix is large.
Here is the code:
dim = 10;
c = [1:dim]';
e = eye(dim);
x = zeros([dim, dim, dim]);
% initialize with second term
x(1:dim*(dim+1)+1:end) = 2 * c;
% add first term
x = x + bsxfun(#times, bsxfun(#times, c, shiftdim(c, -1)), shiftdim(c, -2));
% add third term
x = x - sum(sum(bsxfun(#times, shiftdim(c*c',-3), ...
bsxfun(#times, bsxfun(#times, permute(e, [1, 3, 4, 2, 5]), permute(e, [3, 1, 4, 2, 5])), permute(e, [3, 4, 1, 5, 2])) +...
bsxfun(#times, bsxfun(#times, permute(e, [1, 3, 4, 2, 5]), permute(e, [3, 1, 4, 5, 2])), permute(e, [3, 4, 1, 2, 5])) +...
bsxfun(#times, bsxfun(#times, permute(e, [1, 3, 4, 5, 2]), permute(e, [3, 1, 4, 2, 5])), permute(e, [3, 4, 1, 2, 5]))), 5), 4);
EDIT
A much more efficient (esp. memory-wise) computation of the third term:
ec = bsxfun(#times, e, c);
x = x - ...
bsxfun(#times, ec, shiftdim(c, -2)) -...
bsxfun(#times, c', reshape(ec, [dim, 1, dim])) -....
bsxfun(#times, c, reshape(ec, [1, dim, dim]));
You could try the Parallel Computing Toolbox that is namely parfor loops.
x=ktensor({c,c,c});
I=eye(d);
y = zeros(d,d,d, d);
parfor i=1:d
y(:,:,:, i) = 2*c(i)*ktensor({I(:,i),I(:,i),I(:,i)};
end
x = x + sum(y, 4);
z = zeros(d,d,d, d,d);
parfor i=1:d
for j=1:d % only one layer of parallelization is allowed
z(:,:,:, i,j) = c(i)*c(j)*(ktensor({I(:,i),I(:,i),I(:,j)})+ktensor({I(:,i),I(:,j),I(:,i)})+ktensor({I(:,i),I(:,j),I(:,j)}));
end
end
x = x - sum(sum(z, 5), 4);
x % is your result
It just runs the untouched ktensor commands but in separate threads, so the Toolbox takes care of running the code in parallel.
Because of the independence property of each iteration, which means, for example, c_{i+1, j+1} does not rely on c_{i, j}, this is possible.
Depending on the number of cores (and hyperthreading) of your system, there could be up to #-of-cores-times of speed-up.

running mean of a matrix in matlab

Given the nxN matrix A. I want to find the running mean of the rows of the matrix. For this I have done:
mean = cumsum(A, 2);
for k = 1:N
mean(:, k) = mean(:, k)/k;
end
but for large N this takes a while. Is there a more efficient way to do this in MATLAB?
Note: zeeMonkeez's solution is fastest according to some rough benchmarks at the end of my post.
How about
N = 1000;
A = rand(N, N);
m = cumsum(A, 2);
m1 = zeros(size(m));
tic
for j = 1:1000;
for k = 1:N
m1(:, k) = m(:, k)/k;
end
end
toc
Elapsed time is 6.971112 seconds.
tic
for j = 1:1000
n = repmat(1:N, N, 1);
m2 = m./n;
end
toc
Elapsed time is 2.471035 seconds.
Here, you transform your problem into a matrix multiplication (instead of dividing element-wise, divide one matrix by the other point-wise). The matrix you would like to divide by looks like this:
[1, 2, 3, ..., N;
1, 2, .....
.
.
1, 2, .... ]
which you can get using repmat.
EDIT: BENCHMARK
bsxfun as used by #zeeMonkeez is even faster. Both, for the above case (10% difference on my system) and also for a larger matrix (N = 10000) for which case my version actually performs worst (35 sec, vs 30 from OP and 23 from zeeMonkeez's solution).
On my machine, bsxfun is even faster:
N = 1000;
A = rand(N, N);
m = cumsum(A, 2);
tic
for j = 1:1000;
m2 = bsxfun(#rdivide, m, 1:N);
end
toc
Elapsed time: 1.555507 seconds.
bsxfun avoids having to allocate memory for the divisor as repmat does.

Create a 3-dim matrix from two 2-dim matrices

I already have a N_1 x N_2 matrix A, and a N_2 x N_3 matrix B.
I want to create a N_1 x N_2 x N_3 matrix C, such that C(i,j,k) = A(i,j)*B(j,k).
I was wondering if it is possible to create C using some Matlab operation, instead of doing it element by element?
You can do the same thing as the OP's answer using bsxfun (which actually works internally using a similar method, but is a little bit cleaner):
C = bsxfun(#times, A, permute(B, [3 1 2]));
This is also quite a bit faster (bsxfun must do some magic internally - probably takes advantage of MATLAB's internal ability to do certain operations using multiple threads, or it might just be that permuting the smaller matrix is a lot faster, or some combination of similar factors):
>> N1 = 100; N2 = 20; N3 = 4; A = rand(N1, N2); B = rand(N2, N3);
>> tic; for n = 1:10000; C = repmat(A, [1, 1, size(B, 2)]) .* permute(repmat(B, [1, 1, size(A, 1)]), [3, 1, 2]); end; toc
Elapsed time is 2.827492 seconds.
>> tic; for n = 1:10000; C2 = bsxfun(#times, A, permute(B, [3 1 2])); end; toc
Elapsed time is 0.287665 seconds.
Edit: moving the permute inside the repmat shaves a little bit of time off, but it's still nowhere near as fast as bsxfun:
>> tic; for n = 1:10000; C = (repmat(A, [1 1 size(B, 2)]) .* repmat(permute(B, [3 1 2]), [size(A, 1) 1 1])); end; toc
Elapsed time is 2.563069 seconds.
Rather clumsy, but it seems to work:
C = repmat(A, [1, 1, size(B, 2)]) .* permute(repmat(B, [1, 1, size(A, 1)]), [3, 1, 2]);