if i have A=[3 4 5 6] and B=[6 5 4] then i want to compare each value in A with all values in B,
if this value is greater then increase counter with 1 and if this value is equal then increase another counter with 1
If you want an array that corresponds to the result of each value in A, you could do
arrayfun(#(x) sum(x > B), A)
this gives [0, 0, 1, 2]. If you want the total sum you would just put sum(...) around that:
sum(arrayfun(#(x) sum(x > B), A))
this gives 3.
For the equal-counter, you can simply change > to ==:
arrayfun(#(x) sum(x == B), A)
this gives [0, 1, 1, 1].
Another approach in comparison to arrayfun would be bsxfun. Though it takes a bit more memory, I'd argue that it's faster. arrayfun is implicitly a for loop and using loops in MATLAB is usually slower than vectorized approaches.
If you want the greater than case, use the gt function with bsxfun, so:
>> A = [3 4 5 6];
>> B = [6 5 4];
>> sum(bsxfun(#gt, A, B.'), 1)
ans =
0 0 1 2
If you want to accumulate all of the values that match the criterion, you can put another sum call within this bsxfun call:
>> sum(sum(bsxfun(#gt, A, B.'), 1))
ans =
3
For the greater than or equal case, use ge:
>> sum(bsxfun(#ge, A, B.'), 1)
ans =
0 1 2 3
For the equality case, use eq:
>> sum(bsxfun(#eq, A, B.'), 1)
ans =
0 1 1 1
Again, if you want to accumulate all of the values that match the criterion, nest another sum call with the above results.
I would like to do a function to generalize matrix multiplication. Basically, it should be able to do the standard matrix multiplication, but it should allow to change the two binary operators product/sum by any other function.
The goal is to be as efficient as possible, both in terms of CPU and memory. Of course, it will always be less efficient than A*B, but the operators flexibility is the point here.
Here are a few commands I could come up after reading various interesting threads:
A = randi(10, 2, 3);
B = randi(10, 3, 4);
% 1st method
C = sum(bsxfun(#mtimes, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% Alternative: C = bsxfun(#(a,b) mtimes(a',b), A', permute(B, [1 3 2]))
% 2nd method
C = sum(bsxfun(#(a,b) a*b, permute(A,[1 3 2]),permute(B,[3 2 1])), 3)
% 3rd method (Octave-only)
C = sum(permute(A, [1 3 2]) .* permute(B, [3 2 1]), 3)
% 4th method (Octave-only): multiply nxm A with nx1xd B to create a nxmxd array
C = bsxfun(#(a, b) sum(times(a,b)), A', permute(B, [1 3 2]));
C = C2 = squeeze(C(1,:,:)); % sum and turn into mxd
The problem with methods 1-3 are that they will generate n matrices before collapsing them using sum(). 4 is better because it does the sum() inside the bsxfun, but bsxfun still generates n matrices (except that they are mostly empty, containing only a vector of non-zeros values being the sums, the rest is filled with 0 to match the dimensions requirement).
What I would like is something like the 4th method but without the useless 0 to spare memory.
Any idea?
Here is a slightly more polished version of the solution you posted, with some small improvements.
We check if we have more rows than columns or the other way around, and then do the multiplication accordingly by choosing either to multiply rows with matrices or matrices with columns (thus doing the least amount of loop iterations).
Note: This may not always be the best strategy (going by rows instead of by columns) even if there are less rows than columns; the fact that MATLAB arrays are stored in a column-major order in memory makes it more efficient to slice by columns, as the elements are stored consecutively. Whereas accessing rows involves traversing elements by strides (which is not cache-friendly -- think spatial locality).
Other than that, the code should handle double/single, real/complex, full/sparse (and errors where it is not a possible combination). It also respects empty matrices and zero-dimensions.
function C = my_mtimes(A, B, outFcn, inFcn)
% default arguments
if nargin < 4, inFcn = #times; end
if nargin < 3, outFcn = #sum; end
% check valid input
assert(ismatrix(A) && ismatrix(B), 'Inputs must be 2D matrices.');
assert(isequal(size(A,2),size(B,1)),'Inner matrix dimensions must agree.');
assert(isa(inFcn,'function_handle') && isa(outFcn,'function_handle'), ...
'Expecting function handles.')
% preallocate output matrix
M = size(A,1);
N = size(B,2);
if issparse(A)
args = {'like',A};
elseif issparse(B)
args = {'like',B};
else
args = {superiorfloat(A,B)};
end
C = zeros(M,N, args{:});
% compute matrix multiplication
% http://en.wikipedia.org/wiki/Matrix_multiplication#Inner_product
if M < N
% concatenation of products of row vectors with matrices
% A*B = [a_1*B ; a_2*B ; ... ; a_m*B]
for m=1:M
%C(m,:) = A(m,:) * B;
%C(m,:) = sum(bsxfun(#times, A(m,:)', B), 1);
C(m,:) = outFcn(bsxfun(inFcn, A(m,:)', B), 1);
end
else
% concatenation of products of matrices with column vectors
% A*B = [A*b_1 , A*b_2 , ... , A*b_n]
for n=1:N
%C(:,n) = A * B(:,n);
%C(:,n) = sum(bsxfun(#times, A, B(:,n)'), 2);
C(:,n) = outFcn(bsxfun(inFcn, A, B(:,n)'), 2);
end
end
end
Comparison
The function is no doubt slower throughout, but for larger sizes it is orders of magnitude worse than the built-in matrix-multiplication:
(tic/toc times in seconds)
(tested in R2014a on Windows 8)
size mtimes my_mtimes
____ __________ _________
400 0.0026398 0.20282
600 0.012039 0.68471
800 0.014571 1.6922
1000 0.026645 3.5107
2000 0.20204 28.76
4000 1.5578 221.51
Here is the test code:
sz = [10:10:100 200:200:1000 2000 4000];
t = zeros(numel(sz),2);
for i=1:numel(sz)
n = sz(i); disp(n)
A = rand(n,n);
B = rand(n,n);
tic
C = A*B;
t(i,1) = toc;
tic
D = my_mtimes(A,B);
t(i,2) = toc;
assert(norm(C-D) < 1e-6)
clear A B C D
end
semilogy(sz, t*1000, '.-')
legend({'mtimes','my_mtimes'}, 'Interpreter','none', 'Location','NorthWest')
xlabel('Size N'), ylabel('Time [msec]'), title('Matrix Multiplication')
axis tight
Extra
For completeness, below are two more naive ways to implement the generalized matrix multiplication (if you want to compare the performance, replace the last part of the my_mtimes function with either of these). I'm not even gonna bother posting their elapsed times :)
C = zeros(M,N, args{:});
for m=1:M
for n=1:N
%C(m,n) = A(m,:) * B(:,n);
%C(m,n) = sum(bsxfun(#times, A(m,:)', B(:,n)));
C(m,n) = outFcn(bsxfun(inFcn, A(m,:)', B(:,n)));
end
end
And another way (with a triple-loop):
C = zeros(M,N, args{:});
P = size(A,2); % = size(B,1);
for m=1:M
for n=1:N
for p=1:P
%C(m,n) = C(m,n) + A(m,p)*B(p,n);
%C(m,n) = plus(C(m,n), times(A(m,p),B(p,n)));
C(m,n) = outFcn([C(m,n) inFcn(A(m,p),B(p,n))]);
end
end
end
What to try next?
If you want to squeeze out more performance, you're gonna have to move to a C/C++ MEX-file to cut down on the overhead of interpreted MATLAB code. You can still take advantage of optimized BLAS/LAPACK routines by calling them from MEX-files (see the second part of this post for an example). MATLAB ships with Intel MKL library which frankly you cannot beat when it comes to linear algebra computations on Intel processors.
Others have already mentioned a couple of submissions on the File Exchange that implement general-purpose matrix routines as MEX-files (see #natan's answer). Those are especially effective if you link them against an optimized BLAS library.
Why not just exploit bsxfun's ability to accept an arbitrary function?
C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1);
Here
f is the outer function (corrresponding to sum in the matrix-multiplication case). It should accept a 3D array of arbitrary size mxnxp and operate along its columns to return a 1xmxp array.
g is the inner function (corresponding to product in the matrix-multiplication case). As per bsxfun, it should accept as input either two column vectors of the same size, or one column vector and one scalar, and return as output a column vector of the same size as the input(s).
This works in Matlab. I haven't tested in Octave.
Example 1: Matrix-multiplication:
>> f = #sum; %// outer function: sum
>> g = #times; %// inner function: product
>> A = [1 2 3; 4 5 6];
>> B = [10 11; -12 -13; 14 15];
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
28 30
64 69
Check:
>> A*B
ans =
28 30
64 69
Example 2: Consider the above two matrices with
>> f = #(x,y) sum(abs(x)); %// outer function: sum of absolute values
>> g = #(x,y) max(x./y, y./x); %// inner function: "symmetric" ratio
>> C = shiftdim(feval(f, (bsxfun(g, A.', permute(B,[1 3 2])))), 1)
C =
14.8333 16.1538
5.2500 5.6346
Check: manually compute C(1,2):
>> sum(abs( max( (A(1,:))./(B(:,2)).', (B(:,2)).'./(A(1,:)) ) ))
ans =
16.1538
Without diving into the details, there are tools such as mtimesx and MMX that are fast general purpose matrix and scalar operations routines. You can look into their code and adapt them to your needs.
It would most likely be faster than matlab's bsxfun.
After examination of several processing functions like bsxfun, it seems it won't be possible to do a direct matrix multiplication using these (what I mean by direct is that the temporary products are not stored in memory but summed ASAP and then other sum-products are processed), because they have a fixed size output (either the same as input, either with bsxfun singleton expansion the cartesian product of dimensions of the two inputs). It's however possible to trick Octave a bit (which does not work with MatLab who checks the output dimensions):
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', sparse(1, size(A,1)))
C = bsxfun(#(a,b) sum(bsxfun(#times, a, B))', A', zeros(1, size(A,1), 2))(:,:,2)
However do not use them because the outputted values are not reliable (Octave can mangle or even delete them and return 0!).
So for now on I am just implementing a semi-vectorized version, here's my function:
function C = genmtimes(A, B, outop, inop)
% C = genmtimes(A, B, inop, outop)
% Generalized matrix multiplication between A and B. By default, standard sum-of-products matrix multiplication is operated, but you can change the two operators (inop being the element-wise product and outop the sum).
% Speed note: about 100-200x slower than A*A' and about 3x slower when A is sparse, so use this function only if you want to use a different set of inop/outop than the standard matrix multiplication.
if ~exist('inop', 'var')
inop = #times;
end
if ~exist('outop', 'var')
outop = #sum;
end
[n, m] = size(A);
[m2, o] = size(B);
if m2 ~= m
error('nonconformant arguments (op1 is %ix%i, op2 is %ix%i)\n', n, m, m2, o);
end
C = [];
if issparse(A) || issparse(B)
C = sparse(o,n);
else
C = zeros(o,n);
end
A = A';
for i=1:n
C(:,i) = outop(bsxfun(inop, A(:,i), B))';
end
C = C';
end
Tested with both sparse and normal matrices: the performance gap is a lot less with sparse matrices (3x slower) than with normal matrices (~100x slower).
I think this is slower than bsxfun implementations, but at least it doesn't overflow memory:
A = randi(10, 1000);
C = genmtimes(A, A');
If anyone has any better to offer, I'm still looking for a better alternative!
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
How to Add a row vector to a column vector like matrix multiplication
I have a nx1 vector and a 1xn vector. I want to add them in a special manner like matrix multiplication in an efficient manner (vectorized):
Example:
A=[1 2 3]'
B=[4 5 6]
A \odd_add B =
[1+4 1+5 1+6
2+4 2+5 2+6
3+4 3+5 3+6
]
I have used bsxfun in MATLAB, but I think it is slow. Please help me...
As mentioned by #b3. this would be an appropriate place to use repmat. However in general, and especially if you are dealing with very large matrices, bsxfun normally makes a better substitute. In this case:
>> bsxfun(#plus, [1,2,3]', [4,5,6])
returns the same result, using about a third the memory in the large-matrix limit.
bsxfun basically applies the function in the first argument to every combination of items in the second and third arguments, placing the results in a matrix according to the shape of the input vectors.
I present a comparison of the different methods mentioned here. I am using the TIMEIT function to get robust estimates (takes care of warming up the code, average timing on multiple runs, ..):
function testBSXFUN(N)
%# data
if nargin < 1
N = 500; %# N = 10, 100, 1000, 10000
end
A = (1:N)';
B = (1:N);
%# functions
f1 = #() funcRepmat(A,B);
f2 = #() funcTonyTrick(A,B);
f3 = #() funcBsxfun(A,B);
%# timeit
t(1) = timeit( f1 );
t(2) = timeit( f2 );
t(3) = timeit( f3 );
%# time results
fprintf('N = %d\n', N);
fprintf('REPMAT: %f, TONY_TRICK: %f, BSXFUN: %f\n', t);
%# validation
v{1} = f1();
v{2} = f2();
v{3} = f3();
assert( isequal(v{:}) )
end
where
function C = funcRepmat(A,B)
N = numel(A);
C = repmat(A,1,N) + repmat(B,N,1);
end
function C = funcTonyTrick(A,B)
N = numel(A);
C = A(:,ones(N,1)) + B(ones(N,1),:);
end
function C = funcBsxfun(A,B)
C = bsxfun(#plus, A, B);
end
The timings:
>> for N=[10 100 1000 5000], testBSXFUN(N); end
N = 10
REPMAT: 0.000065, TONY_TRICK: 0.000013, BSXFUN: 0.000031
N = 100
REPMAT: 0.000120, TONY_TRICK: 0.000065, BSXFUN: 0.000085
N = 1000
REPMAT: 0.032988, TONY_TRICK: 0.032947, BSXFUN: 0.010185
N = 5000
REPMAT: 0.810218, TONY_TRICK: 0.824297, BSXFUN: 0.258774
BSXFUN is a clear winner.
In matlab vectorization, there is no substitute for Tony's Trick in terms of speed in comparison to repmat or any other built in Matlab function for that matter. I am sure that the following code must be fastest for your purpose.
>> A = [1 2 3]';
>> B = [4 5 6];
>> AB_sum = A(:,ones(3,1)) + B(ones(3,1),:);
The speed differential will be much more apparent (at LEAST an order of magnitude) for larger size of A and B. See this test I conducted some time ago to ascertain the superiority of Tony's Trick over repmatin terms of time consumption.
REPMAT is your friend:
>> A = [1 2 3]';
>> B = [4 5 6];
>> AplusB = repmat(A, 1, 3) + repmat(B, 3, 1)
AplusB =
5 6 7
6 7 8
7 8 9