I was trying to vectorize a certain weighted sum but couldn't figure out how to do it. I have created a simple minimal working example below. I guess the solution involves either bsxfun or reshape and kronecker products but I still have not managed to get it working.
rng(1);
N = 200;
T1 = 5;
T2 = 7;
T3 = 10;
A = rand(N,T1,T2,T3);
w1 = rand(T1,1);
w2 = rand(T2,1);
w3 = rand(T3,1);
B = zeros(N,1);
for i = 1:N
for j1=1:T1
for j2=1:T2
for j3=1:T3
B(i) = B(i) + w1(j1) * w2(j2) * w3(j3) * A(i,j1,j2,j3);
end
end
end
end
A = B;
For the two dimensional case there is a smart answer here.
You can use an additional multiplication to modify the w1 * w2' grid from the previous answer to then multiply by w3 as well. You can then use matrix multiplication again to multiply with a "flattened" version of A.
W = reshape(w1 * w2.', [], 1) * w3.';
B = reshape(A, size(A, 1), []) * W(:);
You could wrap the creation of weights into it's own function and make this generalizable to N weights. Since this uses recursion, N is limited to your current recursion limit (500 by default).
function W = createWeights(W, varargin)
if numel(varargin) > 0
W = createWeights(W(:) * varargin{1}(:).', varargin{2:end});
end
end
And use it with:
W = createWeights(w1, w2, w3);
B = reshape(A, size(A, 1), []) * W(:);
Update
Using part of #CKT's very good suggestion to use kron, we could modify createWeights just a little bit.
function W = createWeights(W, varargin)
if numel(varargin) > 0
W = createWeights(kron(varargin{1}, W), varargin{2:end});
end
end
Again, you couldn't generalize this that well for N-D unless you made some function to construct the Kronecker product vector, but how about
A = reshape(A, N, []) * kron(w3, kron(w2, w1));
This is the logic behind it:
ww1 = repmat (permute (w1, [4, 1, 2, 3]), [N, 1, T2, T3]);
ww2 = repmat (permute (w2, [3, 4, 1, 2]), [N, T1, 1, T3]);
ww3 = repmat (permute (w3, [2, 3, 4, 1]), [N, T1, T2, 1 ]);
B = ww1 .* ww2 .* ww3 .* A;
B = sum (B(:,:), 2)
You can avoid permute by creating w1, w2, and w3 in the appropriate dimension in the first place. Also you can use bsxfun instead of repmat as appropriate for extra performance, I'm just showing the logic here and repmat is easier to follow.
EDIT: Generalised version for arbitrary input dimensions:
Dims = {N, T1, T2, T3}; % add T4, T5, T6, etc as appropriate
Params = cell (1, length (Dims));
Params{1} = rand (Dims{:});
for n = 2 : length (Dims)
DimSubscripts = ones (1, length (Dims)); DimSubscripts(n) = Dims{n};
RepSubscripts = [Dims{:}]; RepSubscripts(n) = 1;
Params{n} = repmat (rand (DimSubscripts), RepSubscripts);
end
B = times (Params{:});
B = sum (B(:,:), 2)
If we're going the route of having functions anyway, and are favoring performance over elegance/brevity, then consider this:
function B = weightReduce(A, varargin)
B = A;
for i = length(varargin):-1:1
N = length(varargin{i});
B = reshape(B, [], N) * varargin{i};
end
end
This is the performance comparison I see:
tic;
for i = 1:10000
W = createWeights(w1,w2,w3);
B = reshape(A, size(A,1), [])*W(:);
end
toc
Elapsed time is 0.920821 seconds.
tic;
for i = 1:10000
B2 = weightReduce(A, w1, w2, w3);
end
toc
Elapsed time is 0.484470 seconds.
Related
I want to realize component-wise matrix multiplication in MATLAB, which can be done using numpy.einsum in Python as below:
import numpy as np
M = 2
N = 4
I = 2000
J = 300
A = np.random.randn(M, M, I)
B = np.random.randn(M, M, N, J, I)
C = np.random.randn(M, J, I)
# using einsum
D = np.einsum('mki, klnji, lji -> mnji', A, B, C)
# naive for-loop
E = np.zeros(M, N, J, I)
for i in range(I):
for j in range(J):
for n in range(N):
E[:,n,j,i] = B[:,:,i] # A[:,:,n,j,i] # C[:,j,i]
print(np.sum(np.abs(D-E))) # expected small enough
So far I use for-loop of i, j, and n, but I don't want to, at least for-loop of n.
Option 1: Calling numpy from MATLAB
Assuming your system is set up according to the documentation, and you have the numpy package installed, you could do (in MATLAB):
np = py.importlib.import_module('numpy');
M = 2;
N = 4;
I = 2000;
J = 300;
A = matpy.mat2nparray( randn(M, M, I) );
B = matpy.mat2nparray( randn(M, M, N, J, I) );
C = matpy.mat2nparray( randn(M, J, I) );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
Where matpy can be found here.
Option 2: Native MATLAB
Here the most important part is to get the permutations right, so we need to keep track of our dimensions. We'll be using the following order:
I(1) J(2) K(3) L(4) M(5) N(6)
Now, I'll explain how I got the correct permute order (let's take the example of A): einsum expects the dimension order to be mki, which according to our numbering is 5 3 1. This tells us that the 1st dimension of A needs to be the 5th, the 2nd needs to be 3rd and the 3rd needs to be 1st (in short 1->5, 2->3, 3->1). This also means that the "sourceless dimensions" (meaning those that have no original dimensions becoming them; in this case 2 4 6) should be singleton. Using ipermute this is really simple to write:
pA = ipermute(A, [5,3,1,2,4,6]);
In the above example, 1->5 means we write 5 first, and the same goes for the other two dimensions (yielding [5,3,1]). Then we just add the singletons (2,4,6) at the end to get [5,3,1,2,4,6]. Finally:
A = randn(M, M, I);
B = randn(M, M, N, J, I);
C = randn(M, J, I);
% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(A, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(B, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(C, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( ...
permute(pA .* pB .* pC, [5,6,2,1,3,4]), ... 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
(see note regarding sum at the bottom of the post.)
Another way to do it in MATLAB, as mentioned by #AndrasDeak, is the following:
rD = squeeze(sum(reshape(A, [M, M, 1, 1, 1, I]) .* ...
reshape(B, [1, M, M, N, J, I]) .* ...
... % same as: reshape(B, [1, size(B)]) .* ...
... % same as: shiftdim(B,-1) .* ...
reshape(C, [1, 1, M, 1, J, I]), [2, 3]));
See also: squeeze, reshape, permute, ipermute, shiftdim.
Here's a full example that shows that tests whether these methods are equivalent:
function q55913093
M = 2;
N = 4;
I = 2000;
J = 300;
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
%% Option 1 - using numpy:
np = py.importlib.import_module('numpy');
A = matpy.mat2nparray( mA );
B = matpy.mat2nparray( mB );
C = matpy.mat2nparray( mC );
D = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', A, B, C) );
%% Option 2 - native MATLAB:
%%% Reference dim order: I(1) J(2) K(3) L(4) M(5) N(6)
pA = ipermute(mA, [5,3,1,2,4,6]); % 1->5, 2->3, 3->1; 2nd, 4th & 6th are singletons
pB = ipermute(mB, [3,4,6,2,1,5]); % 1->3, 2->4, 3->6, 4->2, 5->1; 5th is singleton
pC = ipermute(mC, [4,2,1,3,5,6]); % 1->4, 2->2, 3->1; 3rd, 5th & 6th are singletons
pD = sum( permute( ...
pA .* pB .* pC, [5,6,2,1,3,4]), ... % 1->5, 2->6, 3->2, 4->1; 3rd & 4th are singletons
[5,6]);
rD = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
%% Comparisons:
sum(abs(pD-D), 'all')
isequal(pD,rD)
Running the above we get that the results are indeed equivalent:
>> q55913093
ans =
2.1816e-10
ans =
logical
1
Note that these two methods of calling sum were introduced in recent releases, so you might need to replace them if your MATLAB is relatively old:
S = sum(A,'all') % can be replaced by ` sum(A(:)) `
S = sum(A,vecdim) % can be replaced by ` sum( sum(A, dim1), dim2) `
As requested in the comments, here's a benchmark comparing the methods:
function t = q55913093_benchmark(M,N,I,J)
if nargin == 0
M = 2;
N = 4;
I = 2000;
J = 300;
end
% Define the arrays in MATLAB
mA = randn(M, M, I);
mB = randn(M, M, N, J, I);
mC = randn(M, J, I);
% Define the arrays in numpy
np = py.importlib.import_module('numpy');
pA = matpy.mat2nparray( mA );
pB = matpy.mat2nparray( mB );
pC = matpy.mat2nparray( mC );
% Test for equivalence
D = cat(5, M1(), M2(), M3());
assert( sum(abs(D(:,:,:,:,1) - D(:,:,:,:,2)), 'all') < 1E-8 );
assert( isequal (D(:,:,:,:,2), D(:,:,:,:,3)));
% Time
t = [ timeit(#M1,1), timeit(#M2,1), timeit(#M3,1)];
function out = M1()
out = matpy.nparray2mat( np.einsum('mki, klnji, lji -> mnji', pA, pB, pC) );
end
function out = M2()
out = permute( ...
sum( ...
ipermute(mA, [5,3,1,2,4,6]) .* ...
ipermute(mB, [3,4,6,2,1,5]) .* ...
ipermute(mC, [4,2,1,3,5,6]), [3,4]...
), [5,6,2,1,3,4]...
);
end
function out = M3()
out = squeeze(sum(reshape(mA, [M, M, 1, 1, 1, I]) .* ...
reshape(mB, [1, M, M, N, J, I]) .* ...
reshape(mC, [1, 1, M, 1, J, I]), [2, 3]));
end
end
On my system this results in:
>> q55913093_benchmark
ans =
1.3964 0.1864 0.2428
Which means that the 2nd method is preferable (at least for the default input sizes).
Suppose there are three n * n matrices X, Y, S. How to fast compute the the following scalars b
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
The computation cost is O(n^3). There exists a fast way to compute the outer product of two matrices. Specifically, the matrix C
for i = 1:n
C = C + X(i,:)' * Y(i,:);
end
can be calculated without for loop C = A.'*B which is only O(n^2). Is there exists a faster way to compute b?
You can use:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Given your example
b=0;
for i = 1:n
b = b + sum(sum((X(i,:).' * Y(i,:) - S).^2));
end
We can first bring the summation out of the loop:
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:) - S).^2;
end
b=sum(b(:))
Knowing that we can write (a - b)^2 as a^2 - 2*a*b + b^2
b=0;
for i = 1:n
b = b + (X(i,:).' * Y(i,:)).^2 - 2.* (X(i,:).' * Y(i,:)) .*S + S.^2;
end
b=sum(b(:))
And we know that (a * b) ^ 2 is the same as a^2 * b^2:
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=0;
for i = 1:n
b = b + (X2(i,:).' * Y2(i,:)) - 2.* (X(i,:).' * Y(i,:)) .*S + S2;
end
b=sum(b(:))
Now we can compute each term separately:
b = sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
Here is the result of a test in Octave that compares my method and two other methods provided by #AndrasDeak and the original loop based solution for inputs of size 500*500:
===rahnema1 (B)===
Elapsed time is 0.0984299 seconds.
===Andras Deak (B2)===
Elapsed time is 7.86407 seconds.
===Andras Deak (B3)===
Elapsed time is 2.99158 seconds.
===Loop solution===
Elapsed time is 2.20357 seconds
n=500;
X= rand(n);
Y= rand(n);
S= rand(n);
disp('===rahnema1 (B)===')
tic
X2 = X.^2;
Y2 = Y.^2;
S2 = S.^2;
b=sum(sum(X2.' * Y2 - 2 * (X.' * Y ) .* S + n * S2));
toc
disp('===Andras Deak (B2)===')
tic
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, []));
toc
disp('===Andras Deak (B3)===')
tic
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, []));
toc
tic
b=0;
for i = 1:n
b = b + sum(sum((X(i,:)' * Y(i,:) - S).^2));
end
toc
You probably can't spare time complexity, but you can make use of vectorization to get rid of the loop and make use of low-level code and caching as much as possible. Whether it's actually faster depends on your dimensions, so you need to do some timing tests to see if this is worth it:
% dummy data
n = 3;
X = rand(n);
Y = rand(n);
S = rand(n);
% vectorize
b2 = sum(reshape((permute(reshape(X, [n, 1, n]).*Y, [3,2,1]) - S).^2, 1, []));
% check
b - b2 % close to machine epsilon i.e. zero
What happens is that we insert a new singleton dimension in one of the arrays, ending up with an array of size [n, 1, n] against one with [n, n], the latter being implicitly the same as [n, n, 1]. The overlapping first index corresponds to the i in your loop, the remaining two indices correspond to the matrix indices of the dyadic product you have for each i. Then we permute the indices in order to put the "i" index last, so that we can again broadcast the result with S of (implicit) size [n, n, 1]. What we then have is a matrix of size [n, n, n] where the first two indices are matrix indices in your original and the last one corresponds to i. We then just have to take the square and sum each term (instead of summing twice I reshaped the array to a row and summed once).
A slight variation of the above transposes S instead of the 3d array which might be faster (again, you should time it):
b3 = sum(reshape((reshape(X, [n, 1, n]).*Y - reshape(S.', [1, n, n])).^2, 1, []));
In terms of performance, reshape is free (it only reinterprets data, it doesn't copy) but permute/transpose will often lead to a perforance hit when data gets copied.
I work primarily in MATLAB but I think the answer should not be too hard to carry over from one language to another.
I have a multi-dimensional array X with dimensions [n, p, 3].
I would like to calculate the following multi-dimensional array.
T = zeros(p, p, p)
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
end
end
end
The sum is of the elements of a length-n vector. Any help is appreciated!
You only need some permuting of dimensions and multiplication with singleton expansion:
T = sum(bsxfun(#times, bsxfun(#times, permute(X(:,:,1), [2 4 5 3 1]), permute(X(:,:,2), [4 2 5 3 1])), permute(X(:,:,3), [4 5 2 3 1])), 5);
From R2016b onwards, this can be written more easily as
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
As I mentioned in a comment, vectorization is not always a huge advantage any more. Therefore there are vectorization methods that slow down the code rather than speed it up. You must always time your solutions. Vectorization often involves the creation of large temporary arrays, or the copy of large amounts of data, which are avoided in loop code. It depends on the architecture, the size of the input, and many other factors if such a solution is going to be faster.
Nonetheless, in this case it seems vectorization approaches can yield a large speedup.
The first thing to notice about the original code is that X(:, i, 1) .* X(:, j, 2) gets re-computed in the inner loop, though it is a constant value there. Rewriting the inner loop as this will save time:
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
end
Now we notice that the inner loop is a dot product, and can be written as follows:
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
The .' transpose on Y does not copy the data, as Y is a vector. Next, we notice that X(:, :, 3) is indexed repeatedly. Let's move this out of the outer loop. Now I'm left with the following code:
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
end
end
It is likely that removing the loop over j is equally easy, which would leave a single loop over i. But this is where I stop.
This is the timings I see (R2017a, 3-year old iMac with 4 cores). For n=10, p=20:
original: 0.0206
moving Y out the inner loop: 0.0100
removing inner loop: 0.0016
moving indexing out of loops: 7.6294e-04
Luis' answer: 1.9196e-04
For a larger array with n=50, p=100:
original: 2.9107
moving Y out the inner loop: 1.3488
removing inner loop: 0.0910
moving indexing out of loops: 0.0361
Luis' answer: 0.1417
"Luis' answer" is this one. It is by far fastest for small arrays, but for larger arrays it shows the cost of the permutation. Moving the computation of the first product out of the inner loop saves a bit over half the computation cost. But removing the inner loop reduces the cost quite dramatically (which I hadn't expected, I presume the single matrix product can use parallelism better than the many small element-wise products). We then get a further time reduction by reducing the amount of indexing operations within the loop.
This is the timing code:
function so()
n = 10; p = 20;
%n = 50; p = 100;
X = randn(n,p,3);
T1 = method1(X);
T2 = method2(X);
T3 = method3(X);
T4 = method4(X);
T5 = method5(X);
assert(max(abs(T1(:)-T2(:)))<1e-13)
assert(max(abs(T1(:)-T3(:)))<1e-13)
assert(max(abs(T1(:)-T4(:)))<1e-13)
assert(max(abs(T1(:)-T5(:)))<1e-13)
timeit(#()method1(X))
timeit(#()method2(X))
timeit(#()method3(X))
timeit(#()method4(X))
timeit(#()method5(X))
function T = method1(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
for k = 1:p
T(i, j, k) = sum(X(:, i, 1) .* X(:, j, 2) .* X(:, k, 3));
end
end
end
function T = method2(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
for k = 1:p
T(i, j, k) = sum(Y .* X(:, k, 3));
end
end
end
function T = method3(X)
p = size(X,2);
T = zeros(p, p, p);
for i = 1:p
for j = 1:p
Y = X(:, i, 1) .* X(:, j, 2);
T(i, j, :) = Y.' * X(:, :, 3);
end
end
function T = method4(X)
p = size(X,2);
T = zeros(p, p, p);
X1 = X(:, :, 1);
X2 = X(:, :, 2);
X3 = X(:, :, 3);
for i = 1:p
for j = 1:p
Y = X1(:, i) .* X2(:, j);
T(i, j, :) = Y.' * X3;
end
end
function T = method5(X)
T = sum(permute(X(:,:,1), [2 4 5 3 1]) .* permute(X(:,:,2), [4 2 5 3 1]) .* permute(X(:,:,3), [4 5 2 3 1]), 5);
You have mentioned you are open to other languages and NumPy by its syntax is very close to MATLAB, so we will try to have a NumPy based solution on this.
Now, these tensor related sum-reductions, specially matrix multiplications ones are easily expressed as einstein-notation and NumPy luckily has one function on the same as np.einsum. Under the hoods, it's implemented in C and is pretty efficient. Recently it's been optimized further to leverage BLAS based matrix-multiplication implementations.
So, a translation of the stated code onto NumPy territory keeping in mind that it follows 0-based indexing and the axes are visuallized differently than the dimensions with MATLAB, would be -
import numpy as np
# X is a NumPy array of shape : (n,p,3). So, a random one could be
# generated with : `X = np.random.rand(n,p,3)`.
T = np.zeros((p, p, p))
for i in range(p):
for j in range(p):
for k in range(p):
T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
The einsum way to solve it would be -
np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2])
To leverage matrix-multiplication, use optimize flag -
np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2],optimize=True)
Timings (with large sizes)
In [27]: n,p = 100,100
...: X = np.random.rand(n,p,3)
In [28]: %%timeit
...: T = np.zeros((p, p, p))
...: for i in range(p):
...: for j in range(p):
...: for k in range(p):
...: T[i, j, k] = np.sum(X[:, i, 0] * X[:, j, 1] * X[:, k, 2])
1 loop, best of 3: 6.23 s per loop
In [29]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2])
1 loop, best of 3: 353 ms per loop
In [31]: %timeit np.einsum('ia,ib,ic->abc',X[...,0],X[...,1],X[...,2],optimize=True)
100 loops, best of 3: 10.5 ms per loop
In [32]: 6230.0/10.5
Out[32]: 593.3333333333334
Around 600x speedup there!
I have two vectors, say A of size nx1, and B of size 1xm. I want to create a result matrix C (nxm) from a non-linear formula so that
C(i,j) = A(i)/(A(i)+B(j)).
I know I can do this with a loop such as:
for i=1:n,
C(i,1:m)=A(i)./(A(i)+B(1:m));
end
but is it possible in some way to do this without using a loop?
EDIT: Thanks for your answers! As a small addition, before I saw them a friend suggested the following solution:
C = A*ones(1,m)./(ones(n,1)*B+A*ones(1,m))
If you are on MATLAB R2016a or earlier you'll want to use bsxfun to accomplish this
result = bsxfun(#rdivide, A, bsxfun(#plus, A, B));
If you are on R2016b or newer, then there is implicit expansion which allows you to remove bsxfun and just apply the element-operators directly
result = A ./ (A + B);
Benchmark
Here is a real benchmark using timeit to compare the execution speed of using bsxfun, repmat, implicit broadcasting and a for loop. As you can see from the results, the bsxfun and implicit broadcasting methods yield the fastest execution time.
function comparision()
sz = round(linspace(10, 5000, 30));
times1 = nan(size(sz));
times2 = nan(size(sz));
times3 = nan(size(sz));
times4 = nan(size(sz));
for k = 1:numel(sz)
A = rand(sz(k), 1);
B = rand(1, sz(k));
times1(k) = timeit(#()option1(A,B));
A = rand(sz(k), 1);
B = rand(1, sz(k));
times2(k) = timeit(#()option2(A,B));
A = rand(sz(k), 1);
B = rand(1, sz(k));
times3(k) = timeit(#()option3(A,B));
A = rand(sz(k), 1);
B = rand(1, sz(k));
times4(k) = timeit(#()option4(A,B));
end
figure
p(1) = plot(sz, times1 * 1000, 'DisplayName', 'bsxfun');
hold on
p(2) = plot(sz, times2 * 1000, 'DisplayName', 'repmat');
p(3) = plot(sz, times3 * 1000, 'DisplayName', 'implicit');
p(4) = plot(sz, times4 * 1000, 'DisplayName', 'for loop');
ylabel('Execution time (ms)')
xlabel('Elements in A')
legend(p)
end
function C = option1(A,B)
C = bsxfun(#rdivide, A, bsxfun(#plus, A, B));
end
function C = option2(A,B)
m = numel(B);
n = numel(A);
C = repmat(A,1,m) ./ (repmat(A,1,m) + repmat(B,n,1));
end
function C = option3(A, B)
C = A ./ (A + B);
end
function C = option4(A, B)
m = numel(B);
n = numel(A);
C = zeros(n, m);
for i=1:n,
C(i,1:m)=A(i)./(A(i)+B(1:m));
end
end
See this answer for more information comparing implicit expansion and bsxfun.
Implicit expansion is the way to go if you have 2016b or newer, as mentioned by Suever. Another approach without that is to do element-wise operations after making A and B the same size as C, using repmat...
A1 = repmat(A,1,m);
B1 = repmat(B,n,1);
C = A1 ./ (A1 + B1);
Or in 1 line...
C = repmat(A,1,m) ./ (repmat(A,1,m) + repmat(B,n,1));
As a benchmark, I ran your loop method, the above repmat method, and Suever's bsxfun method for m = 1000, n = 100, averaging over 1,000 runs each:
Using for loop 0.00290520 sec
Using repmat 0.00014693 sec
Using bsxfun 0.00016402 sec
So for large matrices, repmat and bsxfun are comparable with repmat edging it. For small matrices though, just looping can be quicker than both, especially with a small n as that's your loop variable!
It's worth testing the given methods for your specific use case, as the timing results seem fairly variable depending on the inputs.
I need to do function that works like this :
N1 = size(X,1);
N2 = size(Xtrain,1);
Dist = zeros(N1,N2);
for i=1:N1
for j=1:N2
Dist(i,j)=D-sum(X(i,:)==Xtrain(j,:));
end
end
(X and Xtrain are sparse logical matrixes)
It works fine and passes the tests, but I believe it's not very optimal and well-written solution.
How can I improve that function using some built Matlab functions? I'm absolutely new to Matlab, so I don't know if there really is an opportunity to make it better somehow.
You wanted to learn about vectorization, here some code to study comparing different implementations of this pair-wise distance.
First we build two binary matrices as input (where each row is an instance):
m = 5;
n = 4;
p = 3;
A = double(rand(m,p) > 0.5);
B = double(rand(n,p) > 0.5);
1. double-loop over each pair of instances
D0 = zeros(m,n);
for i=1:m
for j=1:n
D0(i,j) = sum(A(i,:) ~= B(j,:)) / p;
end
end
2. PDIST2
D1 = pdist2(A, B, 'hamming');
3. single-loop over each instance against all other instances
D2 = zeros(m,n);
for i=1:n
D2(:,i) = sum(bsxfun(#ne, A, B(i,:)), 2) ./ p;
end
4. vectorized with grid indexing, all against all
D3 = zeros(m,n);
[x,y] = ndgrid(1:m,1:n);
D3(:) = sum(A(x(:),:) ~= B(y(:),:), 2) ./ p;
5. vectorized in third dimension, all against all
D4 = sum(bsxfun(#ne, A, reshape(B.',[1 p n])), 2) ./ p;
D4 = permute(D4, [1 3 2]);
Finally we compare all methods are equal
assert(isequal(D0,D1,D2,D3,D4))