Related
I would like to do a function to generalize matrix multiplication. Basically, it should be able to do the standard matrix multiplication, but it should allow to change the two binary operators product/sum by any other function.
The goal is to be as efficient as possible, both in terms of CPU and memory. Of course, it will always be less efficient than A*B, but the operators flexibility is the point here.
Here are a few commands I could come up after reading various interesting threads:
A = randi(10, 2, 3);
B = randi(10, 3, 4);
% 1st method
C = sum(bsxfun(#mtimes, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% Alternative: C = bsxfun(#(a,b) mtimes(a',b), A', permute(B, [1 3 2]))
% 2nd method
C = sum(bsxfun(#(a,b) a*b, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% 3rd method (Octave-only)
C = sum(permute(A, [1 3 2]) .* permute(B, [3 2 1]), 3)
% 4th method (Octave-only): multiply nxm A with nx1xd B to create a nxmxd array
C = bsxfun(#(a, b) sum(times(a,b)), A', permute(B, [1 3 2]));
C = C2 = squeeze(C(1,:,:)); % sum and turn into mxd
The problem with methods 1-3 are that they will generate n matrices before collapsing them using sum(). 4 is better because it does the sum() inside the bsxfun, but bsxfun still generates n matrices (except that they are mostly empty, containing only a vector of non-zeros values being the sums, the rest is filled with 0 to match the dimensions requirement).
What I would like is something like the 4th method but without the useless 0 to spare memory.
Any idea?
Here is a slightly more polished version of the solution you posted, with some small improvements.
We check if we have more rows than columns or the other way around, and then do the multiplication accordingly by choosing either to multiply rows with matrices or matrices with columns (thus doing the least amount of loop iterations).
Note: This may not always be the best strategy (going by rows instead of by columns) even if there are less rows than columns; the fact that MATLAB arrays are stored in a column-major order in memory makes it more efficient to slice by columns, as the elements are stored consecutively. Whereas accessing rows involves traversing elements by strides (which is not cache-friendly -- think spatial locality).
Other than that, the code should handle double/single, real/complex, full/sparse (and errors where it is not a possible combination). It also respects empty matrices and zero-dimensions.
function C = my_mtimes(A, B, outFcn, inFcn)
% default arguments
if nargin < 4, inFcn = #times; end
if nargin < 3, outFcn = #sum; end
% check valid input
assert(ismatrix(A) && ismatrix(B), 'Inputs must be 2D matrices.');
assert(isequal(size(A,2),size(B,1)),'Inner matrix dimensions must agree.');
assert(isa(inFcn,'function_handle') && isa(outFcn,'function_handle'), ...
'Expecting function handles.')
% preallocate output matrix
M = size(A,1);
N = size(B,2);
if issparse(A)
args = {'like',A};
elseif issparse(B)
args = {'like',B};
else
args = {superiorfloat(A,B)};
end
C = zeros(M,N, args{:});
% compute matrix multiplication
% http://en.wikipedia.org/wiki/Matrix_multiplication#Inner_product
if M < N
% concatenation of products of row vectors with matrices
% A*B = [a_1*B ; a_2*B ; ... ; a_m*B]
for m=1:M
%C(m,:) = A(m,:) * B;
%C(m,:) = sum(bsxfun(#times, A(m,:)', B), 1);
C(m,:) = outFcn(bsxfun(inFcn, A(m,:)', B), 1);
end
else
% concatenation of products of matrices with column vectors
% A*B = [A*b_1 , A*b_2 , ... , A*b_n]
for n=1:N
%C(:,n) = A * B(:,n);
%C(:,n) = sum(bsxfun(#times, A, B(:,n)'), 2);
C(:,n) = outFcn(bsxfun(inFcn, A, B(:,n)'), 2);
end
end
end
Comparison
The function is no doubt slower throughout, but for larger sizes it is orders of magnitude worse than the built-in matrix-multiplication:
(tic/toc times in seconds)
(tested in R2014a on Windows 8)
size mtimes my_mtimes
____ __________ _________
400 0.0026398 0.20282
600 0.012039 0.68471
800 0.014571 1.6922
1000 0.026645 3.5107
2000 0.20204 28.76
4000 1.5578 221.51
Here is the test code:
sz = [10:10:100 200:200:1000 2000 4000];
t = zeros(numel(sz),2);
for i=1:numel(sz)
n = sz(i); disp(n)
A = rand(n,n);
B = rand(n,n);
tic
C = A*B;
t(i,1) = toc;
tic
D = my_mtimes(A,B);
t(i,2) = toc;
assert(norm(C-D) < 1e-6)
clear A B C D
end
semilogy(sz, t*1000, '.-')
legend({'mtimes','my_mtimes'}, 'Interpreter','none', 'Location','NorthWest')
xlabel('Size N'), ylabel('Time [msec]'), title('Matrix Multiplication')
axis tight
Extra
For completeness, below are two more naive ways to implement the generalized matrix multiplication (if you want to compare the performance, replace the last part of the my_mtimes function with either of these). I'm not even gonna bother posting their elapsed times :)
C = zeros(M,N, args{:});
for m=1:M
for n=1:N
%C(m,n) = A(m,:) * B(:,n);
%C(m,n) = sum(bsxfun(#times, A(m,:)', B(:,n)));
C(m,n) = outFcn(bsxfun(inFcn, A(m,:)', B(:,n)));
end
end
And another way (with a triple-loop):
C = zeros(M,N, args{:});
P = size(A,2); % = size(B,1);
for m=1:M
for n=1:N
for p=1:P
%C(m,n) = C(m,n) + A(m,p)*B(p,n);
%C(m,n) = plus(C(m,n), times(A(m,p),B(p,n)));
C(m,n) = outFcn([C(m,n) inFcn(A(m,p),B(p,n))]);
end
end
end
What to try next?
If you want to squeeze out more performance, you're gonna have to move to a C/C++ MEX-file to cut down on the overhead of interpreted MATLAB code. You can still take advantage of optimized BLAS/LAPACK routines by calling them from MEX-files (see the second part of this post for an example). MATLAB ships with Intel MKL library which frankly you cannot beat when it comes to linear algebra computations on Intel processors.
Others have already mentioned a couple of submissions on the File Exchange that implement general-purpose matrix routines as MEX-files (see #natan's answer). Those are especially effective if you link them against an optimized BLAS library.
Why not just exploit bsxfun's ability to accept an arbitrary function?
C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1);
Here
f is the outer function (corrresponding to sum in the matrix-multiplication case). It should accept a 3D array of arbitrary size mxnxp and operate along its columns to return a 1xmxp array.
g is the inner function (corresponding to product in the matrix-multiplication case). As per bsxfun, it should accept as input either two column vectors of the same size, or one column vector and one scalar, and return as output a column vector of the same size as the input(s).
This works in Matlab. I haven't tested in Octave.
Example 1: Matrix-multiplication:
>> f = #sum; %// outer function: sum
>> g = #times; %// inner function: product
>> A = [1 2 3; 4 5 6];
>> B = [10 11; -12 -13; 14 15];
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
28 30
64 69
Check:
>> A*B
ans =
28 30
64 69
Example 2: Consider the above two matrices with
>> f = #(x,y) sum(abs(x)); %// outer function: sum of absolute values
>> g = #(x,y) max(x./y, y./x); %// inner function: "symmetric" ratio
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
14.8333 16.1538
5.2500 5.6346
Check: manually compute C(1,2):
>> sum(abs( max( (A(1,:))./(B(:,2)).', (B(:,2)).'./(A(1,:)) ) ))
ans =
16.1538
Without diving into the details, there are tools such as mtimesx and MMX that are fast general purpose matrix and scalar operations routines. You can look into their code and adapt them to your needs.
It would most likely be faster than matlab's bsxfun.
After examination of several processing functions like bsxfun, it seems it won't be possible to do a direct matrix multiplication using these (what I mean by direct is that the temporary products are not stored in memory but summed ASAP and then other sum-products are processed), because they have a fixed size output (either the same as input, either with bsxfun singleton expansion the cartesian product of dimensions of the two inputs). It's however possible to trick Octave a bit (which does not work with MatLab who checks the output dimensions):
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', sparse(1, size(A,1)))
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', zeros(1, size(A,1), 2))(:,:,2)
However do not use them because the outputted values are not reliable (Octave can mangle or even delete them and return 0!).
So for now on I am just implementing a semi-vectorized version, here's my function:
function C = genmtimes(A, B, outop, inop)
% C = genmtimes(A, B, inop, outop)
% Generalized matrix multiplication between A and B. By default, standard sum-of-products matrix multiplication is operated, but you can change the two operators (inop being the element-wise product and outop the sum).
% Speed note: about 100-200x slower than A*A' and about 3x slower when A is sparse, so use this function only if you want to use a different set of inop/outop than the standard matrix multiplication.
if ~exist('inop', 'var')
inop = #times;
end
if ~exist('outop', 'var')
outop = #sum;
end
[n, m] = size(A);
[m2, o] = size(B);
if m2 ~= m
error('nonconformant arguments (op1 is %ix%i, op2 is %ix%i)\n', n, m, m2, o);
end
C = [];
if issparse(A) || issparse(B)
C = sparse(o,n);
else
C = zeros(o,n);
end
A = A';
for i=1:n
C(:,i) = outop(bsxfun(inop, A(:,i), B))';
end
C = C';
end
Tested with both sparse and normal matrices: the performance gap is a lot less with sparse matrices (3x slower) than with normal matrices (~100x slower).
I think this is slower than bsxfun implementations, but at least it doesn't overflow memory:
A = randi(10, 1000);
C = genmtimes(A, A');
If anyone has any better to offer, I'm still looking for a better alternative!
Can anyone help vectorize this Matlab code? The specific problem is the sum and bessel function with vector inputs.
Thank you!
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
You could try to vectorize this code, which might be possible with some bsxfun or so, but it would be hard to understand code, and it is the question if it would run any faster, since your code already uses vector math in the inner loop (even though your vectors only have length 3). The resulting code would become very difficult to read, so you or your colleague will have no idea what it does when you have a look at it in 2 years time.
Before wasting time on vectorization, it is much more important that you learn about loop invariant code motion, which is easy to apply to your code. Some observations:
you do not use fs, so remove that.
the term tau.*besselj(n,k(3)*rho_s) does not depend on any of your loop variables ii and jj, so it is constant. Calculate it once before your loop.
you should probably pre-allocate the matrix Ez_t.
the only terms that change during the loop are fc, which depends on jj, and besselh(n,2,k(3)*rho_o), which depends on ii. I guess that the latter costs much more time to calculate, so it better to not calculate this N*N times in the inner loop, but only N times in the outer loop. If the calculation based on jj would take more time, you could swap the for-loops over ii and jj, but that does not seem to be the case here.
The result code would look something like this (untested):
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
% constant part, does not depend on ii and jj, so calculate only once!
temp1 = tau.*besselj(n,k(3)*rho_s);
Ez_t = nan(length(rho_g), length(phi_g)); % preallocate space
for ii = 1:length(rho_g)
% calculate stuff that depends on ii only
rho_o = rho_g(ii);
temp2 = besselh(n,2,k(3)*rho_o);
for jj = 1:length(phi_g)
phi_o = phi_g(jj);
fc = cos(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(temp1.*temp2.*fc);
end
end
Initialization -
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
Nested loops form (Copy from your code and shown here for comparison only) -
for ii = 1:length(rho_g)
for jj = 1:length(phi_g)
% Coordinates
rho_o = rho_g(ii);
phi_o = phi_g(jj);
% factors
fc = cos(n.*(phi_o-phi_s));
fs = sin(n.*(phi_o-phi_s));
Ez_t(ii,jj) = sum(tau.*besselj(n,k(3)*rho_s).*besselh(n,2,k(3)*rho_o).*fc);
end
end
Vectorized solution -
%%// Term - 1
term1 = repmat(tau.*besselj(n,k(3)*rho_s),[N*N 1]);
%%// Term - 2
[n1,rho_g1] = meshgrid(n,rho_g);
term2_intm = besselh(n1,2,k(3)*rho_g1);
term2 = transpose(reshape(repmat(transpose(term2_intm),[N 1]),N,N*N));
%%// Term -3
angle1 = repmat(bsxfun(#times,bsxfun(#minus,phi_g,phi_s')',n),[N 1]);
fc = cos(angle1);
%%// Output
Ez_t = sum(term1.*term2.*fc,2);
Ez_t = transpose(reshape(Ez_t,N,N));
Points to note about this vectorization or code simplification –
‘fs’ doesn’t change the output of the script, Ez_t, so it could be removed for now.
The output seems to be ‘Ez_t’,which requires three basic terms in the code as –
tau.*besselj(n,k(3)*rho_s), besselh(n,2,k(3)*rho_o) and fc. These are calculated separately for vectorization as terms1,2 and 3 respectively.
All these three terms appear to be of 1xN sizes. Our aim thus becomes to calculate these three terms without loops. Now, the two loops run for N times each, thus giving us a total loop count of NxN. Thus, we must have NxN times the data in each such term as compared to when these terms were inside the nested loops.
This is basically the essence of the vectorization done here, as the three terms are represented by ‘term1’,’term2’ and ‘fc’ itself.
In order to give a self-contained answer, I'll copy the original initialization
N = 3;
rho_g = linspace(1e-3,1,N);
phi_g = linspace(0,2*pi,N);
n = 1:3;
tau = [1 2.*ones(1,length(n)-1)];
and generate some missing data (k(3) and rho_s and phi_s in the dimension of n)
rho_s = rand(size(n));
phi_s = rand(size(n));
k(3) = rand(1);
then you can compute the same Ez_t with multidimensional arrays:
[RHO_G, PHI_G, N] = meshgrid(rho_g, phi_g, n);
[~, ~, TAU] = meshgrid(rho_g, phi_g, tau);
[~, ~, RHO_S] = meshgrid(rho_g, phi_g, rho_s);
[~, ~, PHI_S] = meshgrid(rho_g, phi_g, phi_s);
FC = cos(N.*(PHI_G - PHI_S));
FS = sin(N.*(PHI_G - PHI_S)); % not used
EZ_T = sum(TAU.*besselj(N, k(3)*RHO_S).*besselh(N, 2, k(3)*RHO_G).*FC, 3).';
You can check afterwards that both matrices are the same
norm(Ez_t - EZ_T)
I am multiplying two matrices A (of size nxn) and B (of size nxm). The simplest way in matlab will be
n = 1000;
m = 500;
for k=1:n
A(k, :) = (1:n)+k;
end
B = rand(n, m);
C = A*B; % C of the size nxm
however, this code occupies too much of memory when n and/or m too big. So I am looking for a vectorize version of array to implement that
n = 1000;
m = 500;
B = rand(n, m);
func0 = #(k, colv) [(1:n)+k]*colv;
func1 = #(V) arrayfun(func0, 1:n, V);
func1(B)
but it doesn't work. It said the dimension doesn't match up. Anybody has any suggestion?
I would not use anything fancy for this, just break down the linear algebra being performed.
C = zeros(n,m);
for k = 1:n
C(k,:) = ((1:n)+k) * B;
end
Or, slightly more verbosely
C = zeros(n,m);
for k = 1:n
A_singleRow = ((1:n)+k);
C(k,:) = A_singleRow* B;
end
For crazy-big sizes (which it sounds like you have), try reformulating the problem so that you can iterate on columns, rather than rows. (Matlab uses column-major matrix storage, which means that elements in the same column are adjacent in memory. Usually thining about this falls into the realm of over-optimization, but maybe not for you.)
For example, you could construct Ctranspose as follows:
Ctranspose = zeros(m,n); %Note reversed order of n, m
Btranspose = B'; %Of course you may want to just create Btranspose first
for k = 1:n
A_singleRowAsColumn = ((1:n)'+k);
Ctranspose(:,k) = Btranspose * A_singleRowAsColumn;
end
The tools arrayfun, cellfun are very useful to functionalize a for loop, which can be used to make code more clear. However they are not generally useful when trying to squeeze performance. Even if the anonymous function/arrayfun implementation was debugged, I suspect it would require roughly the same memory usage.
I have the following code:
A = rand(N1,N2);
b = rand(1,N1);
B = zeros(N1,N2);
for i=1:N1
for j=1:N2
B(i,j) = A(i,j)*b(i);
end
end
The question is how to write it in vector operation form? Something like B(:,:) = A(:,:).*b(:).
Simple use for bsxfun:
B = bsxfun(#times, A, b')
You can also try:
B = A*.(repmat(b,N2,1))';
Here, firstly you produce N2 repeated version of vector b and multiply it with A in an elementwise manner
Suppose I have an AxBxC matrix X and a BxD matrix Y.
Is there a non-loop method by which I can multiply each of the C AxB matrices with Y?
As a personal preference, I like my code to be as succinct and readable as possible.
Here's what I would have done, though it doesn't meet your 'no-loops' requirement:
for m = 1:C
Z(:,:,m) = X(:,:,m)*Y;
end
This results in an A x D x C matrix Z.
And of course, you can always pre-allocate Z to speed things up by using Z = zeros(A,D,C);.
You can do this in one line using the functions NUM2CELL to break the matrix X into a cell array and CELLFUN to operate across the cells:
Z = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
The result Z is a 1-by-C cell array where each cell contains an A-by-D matrix. If you want Z to be an A-by-D-by-C matrix, you can use the CAT function:
Z = cat(3,Z{:});
NOTE: My old solution used MAT2CELL instead of NUM2CELL, which wasn't as succinct:
[A,B,C] = size(X);
Z = cellfun(#(x) x*Y,mat2cell(X,A,B,ones(1,C)),'UniformOutput',false);
Here's a one-line solution (two if you want to split into 3rd dimension):
A = 2;
B = 3;
C = 4;
D = 5;
X = rand(A,B,C);
Y = rand(B,D);
%# calculate result in one big matrix
Z = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
%'# split into third dimension
Z = permute(reshape(Z',[D A C]),[2 1 3]);
Hence now: Z(:,:,i) contains the result of X(:,:,i) * Y
Explanation:
The above may look confusing, but the idea is simple.
First I start by take the third dimension of X and do a vertical concatenation along the first dim:
XX = cat(1, X(:,:,1), X(:,:,2), ..., X(:,:,C))
... the difficulty was that C is a variable, hence you can't generalize that expression using cat or vertcat. Next we multiply this by Y:
ZZ = XX * Y;
Finally I split it back into the third dimension:
Z(:,:,1) = ZZ(1:2, :);
Z(:,:,2) = ZZ(3:4, :);
Z(:,:,3) = ZZ(5:6, :);
Z(:,:,4) = ZZ(7:8, :);
So you can see it only requires one matrix multiplication, but you have to reshape the matrix before and after.
I'm approaching the exact same issue, with an eye for the most efficient method. There are roughly three approaches that i see around, short of using outside libraries (i.e., mtimesx):
Loop through slices of the 3D matrix
repmat-and-permute wizardry
cellfun multiplication
I recently compared all three methods to see which was quickest. My intuition was that (2) would be the winner. Here's the code:
% generate data
A = 20;
B = 30;
C = 40;
D = 50;
X = rand(A,B,C);
Y = rand(B,D);
% ------ Approach 1: Loop (via #Zaid)
tic
Z1 = zeros(A,D,C);
for m = 1:C
Z1(:,:,m) = X(:,:,m)*Y;
end
toc
% ------ Approach 2: Reshape+Permute (via #Amro)
tic
Z2 = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
Z2 = permute(reshape(Z2',[D A C]),[2 1 3]);
toc
% ------ Approach 3: cellfun (via #gnovice)
tic
Z3 = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
Z3 = cat(3,Z3{:});
toc
All three approaches produced the same output (phew!), but, surprisingly, the loop was the fastest:
Elapsed time is 0.000418 seconds.
Elapsed time is 0.000887 seconds.
Elapsed time is 0.001841 seconds.
Note that the times can vary quite a lot from one trial to another, and sometimes (2) comes out the slowest. These differences become more dramatic with larger data. But with much bigger data, (3) beats (2). The loop method is still best.
% pretty big data...
A = 200;
B = 300;
C = 400;
D = 500;
Elapsed time is 0.373831 seconds.
Elapsed time is 0.638041 seconds.
Elapsed time is 0.724581 seconds.
% even bigger....
A = 200;
B = 200;
C = 400;
D = 5000;
Elapsed time is 4.314076 seconds.
Elapsed time is 11.553289 seconds.
Elapsed time is 5.233725 seconds.
But the loop method can be slower than (2), if the looped dimension is much larger than the others.
A = 2;
B = 3;
C = 400000;
D = 5;
Elapsed time is 0.780933 seconds.
Elapsed time is 0.073189 seconds.
Elapsed time is 2.590697 seconds.
So (2) wins by a big factor, in this (maybe extreme) case. There may not be an approach that is optimal in all cases, but the loop is still pretty good, and best in many cases. It is also best in terms of readability. Loop away!
Nope. There are several ways, but it always comes out in a loop, direct or indirect.
Just to please my curiosity, why would you want that anyway ?
To answer the question, and for readability, please see:
ndmult, by ajuanpi (Juan Pablo Carbajal), 2013, GNU GPL
Input
2 arrays
dim
Example
nT = 100;
t = 2*pi*linspace (0,1,nT)’;
# 2 experiments measuring 3 signals at nT timestamps
signals = zeros(nT,3,2);
signals(:,:,1) = [sin(2*t) cos(2*t) sin(4*t).^2];
signals(:,:,2) = [sin(2*t+pi/4) cos(2*t+pi/4) sin(4*t+pi/6).^2];
sT(:,:,1) = signals(:,:,1)’;
sT(:,:,2) = signals(:,:,2)’;
G = ndmult (signals,sT,[1 2]);
Source
Original source. I added inline comments.
function M = ndmult (A,B,dim)
dA = dim(1);
dB = dim(2);
# reshape A into 2d
sA = size (A);
nA = length (sA);
perA = [1:(dA-1) (dA+1):(nA-1) nA dA](1:nA);
Ap = permute (A, perA);
Ap = reshape (Ap, prod (sA(perA(1:end-1))), sA(perA(end)));
# reshape B into 2d
sB = size (B);
nB = length (sB);
perB = [dB 1:(dB-1) (dB+1):(nB-1) nB](1:nB);
Bp = permute (B, perB);
Bp = reshape (Bp, sB(perB(1)), prod (sB(perB(2:end))));
# multiply
M = Ap * Bp;
# reshape back to original format
s = [sA(perA(1:end-1)) sB(perB(2:end))];
M = squeeze (reshape (M, s));
endfunction
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',X,Y);
here is a benchmark for all possible methods. For more detail refer to this question.
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I would like to share my answer to the problems of:
1) making the tensor product of two tensors (of any valence);
2) making the contraction of two tensors along any dimension.
Here are my subroutines for the first and second tasks:
1) tensor product:
function [C] = tensor(A,B)
C = squeeze( reshape( repmat(A(:), 1, numel(B)).*B(:).' , [size(A),size(B)] ) );
end
2) contraction:
Here A and B are the tensors to be contracted along the dimesions i and j respectively. The lengths of these dimensions should be equal, of course. There's no check for this (this would obscure the code) but apart from this it works well.
function [C] = tensorcontraction(A,B, i,j)
sa = size(A);
La = length(sa);
ia = 1:La;
ia(i) = [];
ia = [ia i];
sb = size(B);
Lb = length(sb);
ib = 1:Lb;
ib(j) = [];
ib = [j ib];
% making the i-th dimension the last in A
A1 = permute(A, ia);
% making the j-th dimension the first in B
B1 = permute(B, ib);
% making both A and B 2D-matrices to make use of the
% matrix multiplication along the second dimension of A
% and the first dimension of B
A2 = reshape(A1, [],sa(i));
B2 = reshape(B1, sb(j),[]);
% here's the implicit implication that sa(i) == sb(j),
% otherwise - crash
C2 = A2*B2;
% back to the original shape with the exception
% of dimensions along which we've just contracted
sa(i) = [];
sb(j) = [];
C = squeeze( reshape( C2, [sa,sb] ) );
end
Any critics?
I would think recursion, but that's the only other non- loop method you can do
You could "unroll" the loop, ie write out all the multiplications sequentially that would occur in the loop