Related
Now I have a matrix A of dimension N by p, and the other matrix B of dimension N by q. What I want to have is a matrix, say C, of dimension N by pq such that
C(i,:) = kron(A(i,:), B(i,:));
If N is large, loop over N rows may take quite long time. So currently I am augmenting A and B appropriately(combining usage of repmat, permute and reshape) to turn each matrix of dimension N by pq, and then formulating C by something like
C = A_aug .* B_aug;
Any better idea?
Checkout some bsxfun + permute + reshape magic -
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[])
Benchmarking & Verification
Benchmarking code -
%// Setup inputs
N = 200;
p = 190;
q = 180;
A = rand(N,p);
B = rand(N,q);
disp('--------------------------------------- Without magic')
tic
C = zeros(size(A,1),size(A,2)*size(B,2));
for i = 1:size(A,1)
C(i,:) = kron(A(i,:), B(i,:));
end
toc
disp('--------------------------------------- With some magic')
tic
out = reshape(bsxfun(#times,permute(A,[1 3 2]),B),size(A,1),[]);
toc
error_val = max(abs(C(:)-out(:)))
Output -
--------------------------------------- Without magic
Elapsed time is 0.524396 seconds.
--------------------------------------- With some magic
Elapsed time is 0.055082 seconds.
error_val =
0
I would like to do a function to generalize matrix multiplication. Basically, it should be able to do the standard matrix multiplication, but it should allow to change the two binary operators product/sum by any other function.
The goal is to be as efficient as possible, both in terms of CPU and memory. Of course, it will always be less efficient than A*B, but the operators flexibility is the point here.
Here are a few commands I could come up after reading various interesting threads:
A = randi(10, 2, 3);
B = randi(10, 3, 4);
% 1st method
C = sum(bsxfun(#mtimes, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% Alternative: C = bsxfun(#(a,b) mtimes(a',b), A', permute(B, [1 3 2]))
% 2nd method
C = sum(bsxfun(#(a,b) a*b, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% 3rd method (Octave-only)
C = sum(permute(A, [1 3 2]) .* permute(B, [3 2 1]), 3)
% 4th method (Octave-only): multiply nxm A with nx1xd B to create a nxmxd array
C = bsxfun(#(a, b) sum(times(a,b)), A', permute(B, [1 3 2]));
C = C2 = squeeze(C(1,:,:)); % sum and turn into mxd
The problem with methods 1-3 are that they will generate n matrices before collapsing them using sum(). 4 is better because it does the sum() inside the bsxfun, but bsxfun still generates n matrices (except that they are mostly empty, containing only a vector of non-zeros values being the sums, the rest is filled with 0 to match the dimensions requirement).
What I would like is something like the 4th method but without the useless 0 to spare memory.
Any idea?
Here is a slightly more polished version of the solution you posted, with some small improvements.
We check if we have more rows than columns or the other way around, and then do the multiplication accordingly by choosing either to multiply rows with matrices or matrices with columns (thus doing the least amount of loop iterations).
Note: This may not always be the best strategy (going by rows instead of by columns) even if there are less rows than columns; the fact that MATLAB arrays are stored in a column-major order in memory makes it more efficient to slice by columns, as the elements are stored consecutively. Whereas accessing rows involves traversing elements by strides (which is not cache-friendly -- think spatial locality).
Other than that, the code should handle double/single, real/complex, full/sparse (and errors where it is not a possible combination). It also respects empty matrices and zero-dimensions.
function C = my_mtimes(A, B, outFcn, inFcn)
% default arguments
if nargin < 4, inFcn = #times; end
if nargin < 3, outFcn = #sum; end
% check valid input
assert(ismatrix(A) && ismatrix(B), 'Inputs must be 2D matrices.');
assert(isequal(size(A,2),size(B,1)),'Inner matrix dimensions must agree.');
assert(isa(inFcn,'function_handle') && isa(outFcn,'function_handle'), ...
'Expecting function handles.')
% preallocate output matrix
M = size(A,1);
N = size(B,2);
if issparse(A)
args = {'like',A};
elseif issparse(B)
args = {'like',B};
else
args = {superiorfloat(A,B)};
end
C = zeros(M,N, args{:});
% compute matrix multiplication
% http://en.wikipedia.org/wiki/Matrix_multiplication#Inner_product
if M < N
% concatenation of products of row vectors with matrices
% A*B = [a_1*B ; a_2*B ; ... ; a_m*B]
for m=1:M
%C(m,:) = A(m,:) * B;
%C(m,:) = sum(bsxfun(#times, A(m,:)', B), 1);
C(m,:) = outFcn(bsxfun(inFcn, A(m,:)', B), 1);
end
else
% concatenation of products of matrices with column vectors
% A*B = [A*b_1 , A*b_2 , ... , A*b_n]
for n=1:N
%C(:,n) = A * B(:,n);
%C(:,n) = sum(bsxfun(#times, A, B(:,n)'), 2);
C(:,n) = outFcn(bsxfun(inFcn, A, B(:,n)'), 2);
end
end
end
Comparison
The function is no doubt slower throughout, but for larger sizes it is orders of magnitude worse than the built-in matrix-multiplication:
(tic/toc times in seconds)
(tested in R2014a on Windows 8)
size mtimes my_mtimes
____ __________ _________
400 0.0026398 0.20282
600 0.012039 0.68471
800 0.014571 1.6922
1000 0.026645 3.5107
2000 0.20204 28.76
4000 1.5578 221.51
Here is the test code:
sz = [10:10:100 200:200:1000 2000 4000];
t = zeros(numel(sz),2);
for i=1:numel(sz)
n = sz(i); disp(n)
A = rand(n,n);
B = rand(n,n);
tic
C = A*B;
t(i,1) = toc;
tic
D = my_mtimes(A,B);
t(i,2) = toc;
assert(norm(C-D) < 1e-6)
clear A B C D
end
semilogy(sz, t*1000, '.-')
legend({'mtimes','my_mtimes'}, 'Interpreter','none', 'Location','NorthWest')
xlabel('Size N'), ylabel('Time [msec]'), title('Matrix Multiplication')
axis tight
Extra
For completeness, below are two more naive ways to implement the generalized matrix multiplication (if you want to compare the performance, replace the last part of the my_mtimes function with either of these). I'm not even gonna bother posting their elapsed times :)
C = zeros(M,N, args{:});
for m=1:M
for n=1:N
%C(m,n) = A(m,:) * B(:,n);
%C(m,n) = sum(bsxfun(#times, A(m,:)', B(:,n)));
C(m,n) = outFcn(bsxfun(inFcn, A(m,:)', B(:,n)));
end
end
And another way (with a triple-loop):
C = zeros(M,N, args{:});
P = size(A,2); % = size(B,1);
for m=1:M
for n=1:N
for p=1:P
%C(m,n) = C(m,n) + A(m,p)*B(p,n);
%C(m,n) = plus(C(m,n), times(A(m,p),B(p,n)));
C(m,n) = outFcn([C(m,n) inFcn(A(m,p),B(p,n))]);
end
end
end
What to try next?
If you want to squeeze out more performance, you're gonna have to move to a C/C++ MEX-file to cut down on the overhead of interpreted MATLAB code. You can still take advantage of optimized BLAS/LAPACK routines by calling them from MEX-files (see the second part of this post for an example). MATLAB ships with Intel MKL library which frankly you cannot beat when it comes to linear algebra computations on Intel processors.
Others have already mentioned a couple of submissions on the File Exchange that implement general-purpose matrix routines as MEX-files (see #natan's answer). Those are especially effective if you link them against an optimized BLAS library.
Why not just exploit bsxfun's ability to accept an arbitrary function?
C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1);
Here
f is the outer function (corrresponding to sum in the matrix-multiplication case). It should accept a 3D array of arbitrary size mxnxp and operate along its columns to return a 1xmxp array.
g is the inner function (corresponding to product in the matrix-multiplication case). As per bsxfun, it should accept as input either two column vectors of the same size, or one column vector and one scalar, and return as output a column vector of the same size as the input(s).
This works in Matlab. I haven't tested in Octave.
Example 1: Matrix-multiplication:
>> f = #sum; %// outer function: sum
>> g = #times; %// inner function: product
>> A = [1 2 3; 4 5 6];
>> B = [10 11; -12 -13; 14 15];
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
28 30
64 69
Check:
>> A*B
ans =
28 30
64 69
Example 2: Consider the above two matrices with
>> f = #(x,y) sum(abs(x)); %// outer function: sum of absolute values
>> g = #(x,y) max(x./y, y./x); %// inner function: "symmetric" ratio
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
14.8333 16.1538
5.2500 5.6346
Check: manually compute C(1,2):
>> sum(abs( max( (A(1,:))./(B(:,2)).', (B(:,2)).'./(A(1,:)) ) ))
ans =
16.1538
Without diving into the details, there are tools such as mtimesx and MMX that are fast general purpose matrix and scalar operations routines. You can look into their code and adapt them to your needs.
It would most likely be faster than matlab's bsxfun.
After examination of several processing functions like bsxfun, it seems it won't be possible to do a direct matrix multiplication using these (what I mean by direct is that the temporary products are not stored in memory but summed ASAP and then other sum-products are processed), because they have a fixed size output (either the same as input, either with bsxfun singleton expansion the cartesian product of dimensions of the two inputs). It's however possible to trick Octave a bit (which does not work with MatLab who checks the output dimensions):
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', sparse(1, size(A,1)))
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', zeros(1, size(A,1), 2))(:,:,2)
However do not use them because the outputted values are not reliable (Octave can mangle or even delete them and return 0!).
So for now on I am just implementing a semi-vectorized version, here's my function:
function C = genmtimes(A, B, outop, inop)
% C = genmtimes(A, B, inop, outop)
% Generalized matrix multiplication between A and B. By default, standard sum-of-products matrix multiplication is operated, but you can change the two operators (inop being the element-wise product and outop the sum).
% Speed note: about 100-200x slower than A*A' and about 3x slower when A is sparse, so use this function only if you want to use a different set of inop/outop than the standard matrix multiplication.
if ~exist('inop', 'var')
inop = #times;
end
if ~exist('outop', 'var')
outop = #sum;
end
[n, m] = size(A);
[m2, o] = size(B);
if m2 ~= m
error('nonconformant arguments (op1 is %ix%i, op2 is %ix%i)\n', n, m, m2, o);
end
C = [];
if issparse(A) || issparse(B)
C = sparse(o,n);
else
C = zeros(o,n);
end
A = A';
for i=1:n
C(:,i) = outop(bsxfun(inop, A(:,i), B))';
end
C = C';
end
Tested with both sparse and normal matrices: the performance gap is a lot less with sparse matrices (3x slower) than with normal matrices (~100x slower).
I think this is slower than bsxfun implementations, but at least it doesn't overflow memory:
A = randi(10, 1000);
C = genmtimes(A, A');
If anyone has any better to offer, I'm still looking for a better alternative!
I have three matrices in Matlab, A which is n x m, B which is p x m and C which is r x n.
Say we initialize some matrices using:
A = rand(3,4);
B = rand(2,3);
C = rand(5,4);
The following two are equivalent:
>> s=0;
>> for i=1:3
for j=1:4
s = s + A(i,j)*B(:,i)*C(:,j)';
end;
end
>> s
s =
2.6823 2.2440 3.5056 2.0856 2.1551
2.0656 1.7310 2.6550 1.5767 1.6457
>> B*A*C'
ans =
2.6823 2.2440 3.5056 2.0856 2.1551
2.0656 1.7310 2.6550 1.5767 1.6457
The latter being much more efficient.
I can't find any efficient version for the following variant of the loop:
s=0;
for i=1:3
for j=1:4
x = A(i,j)*B(:,i)*C(:,j)';
s = s + x/sum(sum(x));
end;
end
Here, the matrices being added are normalized by the sum of their values after each step.
Any ideas how to make this efficient like the matrix multiplication above? I thought maybe accumarray could help, but not sure how.
You can do it efficiently with bsxfun:
aux1 = bsxfun(#times, permute(B,[1 3 2]), permute(C,[3 1 4 2]));
aux2 = sum(sum(aux1,1),2);
s = sum(sum(bsxfun(#rdivide, aux1, aux2),3),4);
Note that, because of the normalization, the result is independent of A, assuming it doesn't contain any zero entries (if it does the result is undefined).
I'm fairly new to MATLAB. Normal matrix multiplication of a M x K matrix by an K x N matrix -- C = A * B -- has c_ij = sum(a_ik * b_kj, k = 1:K). What if I want this to be instead c_ij = sum(op(a_ik, b_kj), k = 1:K) for some simple binary operation op? Is there any nice way to vectorize this in MATLAB (or maybe even a built-in function)?
EDIT: This is currently the best I can do.
% A is M x K, B is K x N
% op is min
C = zeros(M, N);
for i = 1:M:
C(i, :) = sum(bsxfun(#min, A(i, :)', B));
end
Listed in this post is a vectorized approach that persists with bsxfun by using permute to create singleton dimensions as needed by bsxfun to let the singleton-expansion do its work and thus essentially replacing the loop in the original post. Please be reminded that bsxfun is a memory hungry implementation, so expect speedup with it only until it is stretched too far. Here's the final solution code -
op = #min; %// Edit this with your own function/ operation
C = sum(bsxfun(op, permute(A,[1 3 2]),permute(B,[3 2 1])),3)
NB - The above solution was inspired by Removing four nested loops in Matlab.
if the operator can operate element-by-element (like .*):
if(size(A,2)~=size(B,1))
error(blah, blah, blah...);
end
C = zeros(size(A,1),size(B,2));
for i = 1:size(A,1)
for j = 1:size(B,2)
C(i,j) = sum(binaryOp(A(i,:)',B(:,j)));
end
end
You can always write the loops yourself:
A = rand(2,3);
B = rand(3,4);
op = #times; %# use your own function here
C = zeros(size(A,1),size(B,2));
for i=1:size(A,1)
for j=1:size(B,2)
for k=1:size(A,2)
C(i,j) = C(i,j) + op(A(i,k),B(k,j));
end
end
end
isequal(C,A*B)
Depending on your specific needs, you may be able to use bsxfun in 3D to trick the binary operator. See this answer for more infos: https://stackoverflow.com/a/23808285/1121352
Another alternative would be to use cellfun with a custom function:
http://matlabgeeks.com/tips-tutorials/computation-using-cellfun/
Suppose I have an AxBxC matrix X and a BxD matrix Y.
Is there a non-loop method by which I can multiply each of the C AxB matrices with Y?
As a personal preference, I like my code to be as succinct and readable as possible.
Here's what I would have done, though it doesn't meet your 'no-loops' requirement:
for m = 1:C
Z(:,:,m) = X(:,:,m)*Y;
end
This results in an A x D x C matrix Z.
And of course, you can always pre-allocate Z to speed things up by using Z = zeros(A,D,C);.
You can do this in one line using the functions NUM2CELL to break the matrix X into a cell array and CELLFUN to operate across the cells:
Z = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
The result Z is a 1-by-C cell array where each cell contains an A-by-D matrix. If you want Z to be an A-by-D-by-C matrix, you can use the CAT function:
Z = cat(3,Z{:});
NOTE: My old solution used MAT2CELL instead of NUM2CELL, which wasn't as succinct:
[A,B,C] = size(X);
Z = cellfun(#(x) x*Y,mat2cell(X,A,B,ones(1,C)),'UniformOutput',false);
Here's a one-line solution (two if you want to split into 3rd dimension):
A = 2;
B = 3;
C = 4;
D = 5;
X = rand(A,B,C);
Y = rand(B,D);
%# calculate result in one big matrix
Z = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
%'# split into third dimension
Z = permute(reshape(Z',[D A C]),[2 1 3]);
Hence now: Z(:,:,i) contains the result of X(:,:,i) * Y
Explanation:
The above may look confusing, but the idea is simple.
First I start by take the third dimension of X and do a vertical concatenation along the first dim:
XX = cat(1, X(:,:,1), X(:,:,2), ..., X(:,:,C))
... the difficulty was that C is a variable, hence you can't generalize that expression using cat or vertcat. Next we multiply this by Y:
ZZ = XX * Y;
Finally I split it back into the third dimension:
Z(:,:,1) = ZZ(1:2, :);
Z(:,:,2) = ZZ(3:4, :);
Z(:,:,3) = ZZ(5:6, :);
Z(:,:,4) = ZZ(7:8, :);
So you can see it only requires one matrix multiplication, but you have to reshape the matrix before and after.
I'm approaching the exact same issue, with an eye for the most efficient method. There are roughly three approaches that i see around, short of using outside libraries (i.e., mtimesx):
Loop through slices of the 3D matrix
repmat-and-permute wizardry
cellfun multiplication
I recently compared all three methods to see which was quickest. My intuition was that (2) would be the winner. Here's the code:
% generate data
A = 20;
B = 30;
C = 40;
D = 50;
X = rand(A,B,C);
Y = rand(B,D);
% ------ Approach 1: Loop (via #Zaid)
tic
Z1 = zeros(A,D,C);
for m = 1:C
Z1(:,:,m) = X(:,:,m)*Y;
end
toc
% ------ Approach 2: Reshape+Permute (via #Amro)
tic
Z2 = reshape(reshape(permute(X, [2 1 3]), [A B*C]), [B A*C])' * Y;
Z2 = permute(reshape(Z2',[D A C]),[2 1 3]);
toc
% ------ Approach 3: cellfun (via #gnovice)
tic
Z3 = cellfun(#(x) x*Y,num2cell(X,[1 2]),'UniformOutput',false);
Z3 = cat(3,Z3{:});
toc
All three approaches produced the same output (phew!), but, surprisingly, the loop was the fastest:
Elapsed time is 0.000418 seconds.
Elapsed time is 0.000887 seconds.
Elapsed time is 0.001841 seconds.
Note that the times can vary quite a lot from one trial to another, and sometimes (2) comes out the slowest. These differences become more dramatic with larger data. But with much bigger data, (3) beats (2). The loop method is still best.
% pretty big data...
A = 200;
B = 300;
C = 400;
D = 500;
Elapsed time is 0.373831 seconds.
Elapsed time is 0.638041 seconds.
Elapsed time is 0.724581 seconds.
% even bigger....
A = 200;
B = 200;
C = 400;
D = 5000;
Elapsed time is 4.314076 seconds.
Elapsed time is 11.553289 seconds.
Elapsed time is 5.233725 seconds.
But the loop method can be slower than (2), if the looped dimension is much larger than the others.
A = 2;
B = 3;
C = 400000;
D = 5;
Elapsed time is 0.780933 seconds.
Elapsed time is 0.073189 seconds.
Elapsed time is 2.590697 seconds.
So (2) wins by a big factor, in this (maybe extreme) case. There may not be an approach that is optimal in all cases, but the loop is still pretty good, and best in many cases. It is also best in terms of readability. Loop away!
Nope. There are several ways, but it always comes out in a loop, direct or indirect.
Just to please my curiosity, why would you want that anyway ?
To answer the question, and for readability, please see:
ndmult, by ajuanpi (Juan Pablo Carbajal), 2013, GNU GPL
Input
2 arrays
dim
Example
nT = 100;
t = 2*pi*linspace (0,1,nT)’;
# 2 experiments measuring 3 signals at nT timestamps
signals = zeros(nT,3,2);
signals(:,:,1) = [sin(2*t) cos(2*t) sin(4*t).^2];
signals(:,:,2) = [sin(2*t+pi/4) cos(2*t+pi/4) sin(4*t+pi/6).^2];
sT(:,:,1) = signals(:,:,1)’;
sT(:,:,2) = signals(:,:,2)’;
G = ndmult (signals,sT,[1 2]);
Source
Original source. I added inline comments.
function M = ndmult (A,B,dim)
dA = dim(1);
dB = dim(2);
# reshape A into 2d
sA = size (A);
nA = length (sA);
perA = [1:(dA-1) (dA+1):(nA-1) nA dA](1:nA);
Ap = permute (A, perA);
Ap = reshape (Ap, prod (sA(perA(1:end-1))), sA(perA(end)));
# reshape B into 2d
sB = size (B);
nB = length (sB);
perB = [dB 1:(dB-1) (dB+1):(nB-1) nB](1:nB);
Bp = permute (B, perB);
Bp = reshape (Bp, sB(perB(1)), prod (sB(perB(2:end))));
# multiply
M = Ap * Bp;
# reshape back to original format
s = [sA(perA(1:end-1)) sB(perB(2:end))];
M = squeeze (reshape (M, s));
endfunction
I highly recommend you use the MMX toolbox of matlab. It can multiply n-dimensional matrices as fast as possible.
The advantages of MMX are:
It is easy to use.
Multiply n-dimensional matrices (actually it can multiply arrays of 2-D matrices)
It performs other matrix operations (transpose, Quadratic Multiply, Chol decomposition and more)
It uses C compiler and multi-thread computation for speed up.
For this problem, you just need to write this command:
C=mmx('mul',X,Y);
here is a benchmark for all possible methods. For more detail refer to this question.
1.6571 # FOR-loop
4.3110 # ARRAYFUN
3.3731 # NUM2CELL/FOR-loop/CELL2MAT
2.9820 # NUM2CELL/CELLFUN/CELL2MAT
0.0244 # Loop Unrolling
0.0221 # MMX toolbox <===================
I would like to share my answer to the problems of:
1) making the tensor product of two tensors (of any valence);
2) making the contraction of two tensors along any dimension.
Here are my subroutines for the first and second tasks:
1) tensor product:
function [C] = tensor(A,B)
C = squeeze( reshape( repmat(A(:), 1, numel(B)).*B(:).' , [size(A),size(B)] ) );
end
2) contraction:
Here A and B are the tensors to be contracted along the dimesions i and j respectively. The lengths of these dimensions should be equal, of course. There's no check for this (this would obscure the code) but apart from this it works well.
function [C] = tensorcontraction(A,B, i,j)
sa = size(A);
La = length(sa);
ia = 1:La;
ia(i) = [];
ia = [ia i];
sb = size(B);
Lb = length(sb);
ib = 1:Lb;
ib(j) = [];
ib = [j ib];
% making the i-th dimension the last in A
A1 = permute(A, ia);
% making the j-th dimension the first in B
B1 = permute(B, ib);
% making both A and B 2D-matrices to make use of the
% matrix multiplication along the second dimension of A
% and the first dimension of B
A2 = reshape(A1, [],sa(i));
B2 = reshape(B1, sb(j),[]);
% here's the implicit implication that sa(i) == sb(j),
% otherwise - crash
C2 = A2*B2;
% back to the original shape with the exception
% of dimensions along which we've just contracted
sa(i) = [];
sb(j) = [];
C = squeeze( reshape( C2, [sa,sb] ) );
end
Any critics?
I would think recursion, but that's the only other non- loop method you can do
You could "unroll" the loop, ie write out all the multiplications sequentially that would occur in the loop