Matlab: Argmax and dot product for each row in a matrix - matlab

I have 2 matrices = X in R^(n*m) and W in R^(k*m) where k<<n.
Let x_i be the i-th row of X and w_j be the j-th row of W.
I need to find, for each x_i what is the j that maximizes <w_j,x_i>
I can't see a way around iterating over all the rows in X, but it there a way to find the maximum dot product without iterating every time over all of W?
A naive implementation would be:
n = 100;
m = 50;
k = 10;
X = rand(n,m);
W = rand(k,m);
Y = zeros(n, 1);
for i = 1 : n
max_ind = 1;
max_val = dot(W(1,:), X(i,:));
for j = 2 : k
cur_val = dot(W(j,:),X(i,:));
if cur_val > max_val
max_val = cur_val;
max_ind = j;
end
end
Y(i,:) = max_ind;
end

Dot product is essentially matrix multiplication:
[~, Y] = max(W*X');

bsxfun based approach to speed-up things for you -
[~,Y] = max(sum(bsxfun(#times,X,permute(W,[3 2 1])),2),[],3)
On my system, using your dataset I am getting a 100x+ speedup with this.
One can think of two more "closeby" approaches, but they don't seem to give any huge improvement over the earlier one -
[~,Y] = max(squeeze(sum(bsxfun(#times,X,permute(W,[3 2 1])),2)),[],2)
and
[~,Y] = max(squeeze(sum(bsxfun(#times,X',permute(W,[2 3 1]))))')

Related

average bins along a dimension of a nd array in matlab

To compute the mean of every bins along a dimension of a nd array in matlab, for example, average every 10 elements along dim 4 of a 4d array
x = reshape(1:30*30*20*300,30,30,20,300);
n = 10;
m = size(x,4)/10;
y = nan(30,30,20,m);
for ii = 1 : m
y(:,:,:,ii) = mean(x(:,:,:,(1:n)+(ii-1)*n),4);
end
It looks a bit silly. I think there must be better ways to average the bins?
Besides, is it possible to make the script applicable to general cases, namely, arbitray ndims of array and along an arbitray dim to average?
For the second part of your question you can use this:
x = reshape(1:30*30*20*300,30,30,20,300);
dim = 4;
n = 10;
m = size(x,dim)/10;
y = nan(30,30,20,m);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
for ii = 1 : m
idx1{dim} = ii;
idx2{dim} = (1:n)+(ii-1)*n;
y(idx1{:}) = mean(x(idx2{:}),dim);
end
For the first part of the question here is an alternative using cumsum and diff, but it may not be better then the loop solution:
function y = slicedmean(x,slice_size,dim)
s = cumsum(x,dim);
idx1 = repmat({':'},1,ndims(x));
idx2 = repmat({':'},1,ndims(x));
idx1{dim} = slice_size;
idx2{dim} = slice_size:slice_size:size(x,dim);
y = cat(dim,s(idx1{:}),diff(s(idx2{:}),[],dim))/slice_size;
end
Here is a generic solution, using the accumarray function. I haven't tested how fast it is. There might be some room for improvement though.
Basically, accumarray groups the value in x following a matrix of customized index for your question
x = reshape(1:30*30*20*300,30,30,20,300);
s = size(x);
% parameters for averaging
dimAv = 4;
n = 10;
% get linear index
ix = (1:numel(x))';
% transform them to a matrix of index per dimension
% this is a customized version of ind2sub
pcum = [1 cumprod(s(1:end-1))];
sub = zeros(numel(ix),numel(s));
for i = numel(s):-1:1,
ixtmp = rem(ix-1, pcum(i)) + 1;
sub(:,i) = (ix - ixtmp)/pcum(i) + 1;
ix = ixtmp;
end
% correct index for the given dimension
sub(:,dimAv) = floor((sub(:,dimAv)-1)/n)+1;
% run the accumarray to compute the average
sout = s;
sout(dimAv) = ceil(sout(dimAv)/n);
y = accumarray(sub,x(:), sout, #mean);
If you need a faster and memory efficient operation, you'll have to write your own mex function. It shouldn't be so difficult, I think !

matlab vectorization and avoid for loop

There are two matrix X and M and I need to obtain the following matrix D
m = 20; n = 10;
X = rand(m,n);
M = rand(m,m);
M = (M + M')/2;
D = zeros(n,n);
for i = 1:n
for j = 1:n
D(i,j) = X(:,i)'*M*X(:,j);
end
end
When n and m are large, the computation of D is very slow. Is there any way to speed up?
The answer would be:
D = 0.5*X.'*(M+M')*X
(This is a slight modification of the solution provided by Divakar, so that the correct matrix D is returned)

Smarter way to generate a matrix of zeros and ones in Matlab

I would like to generate all the possible adjacency matrices (zero diagonale) of an undirected graph of n nodes.
For example, with no relabeling for n=3 we get 23(3-1)/2 = 8 possible network configurations (or adjacency matrices).
One solution that works for n = 3 (and which I think is quite stupid) would be the following:
n = 3;
A = [];
for k = 0:1
for j = 0:1
for i = 0:1
m = [0 , i , j ; i , 0 , k ; j , k , 0 ];
A = [A, m];
end
end
end
Also I though of the following which seems to be faster but something is wrong with my indexing since 2 matrices are missing:
n = 3
C = [];
E = [];
A = zeros(n);
for i = 1:n
for j = i+1:n
A(i,j) = 1;
A(j,i) = 1;
C = [C,A];
end
end
B = ones(n);
B = B- diag(diag(ones(n)));
for i = 1:n
for j = i+1:n
B(i,j) = 0;
B(j,i) = 0;
E = [E,B];
end
end
D = [C,E]
Is there a faster way of doing this?
I would definitely generate the off-diagonal elements of the adjacency matrices with binary encoding:
n = 4; %// number of nodes
m = n*(n-1)/2;
offdiags = dec2bin(0:2^m-1,m)-48; %//every 2^m-1 possible configurations
If you have the Statistics and Machine Learning Toolbox, then squareform will easily create the matrices for you, one by one:
%// this is basically a for loop
tmpcell = arrayfun(#(k) squareform(offdiags(k,:)),1:size(offdiags,1),...
'uniformoutput',false);
A = cat(2,tmpcell{:}); %// concatenate the matrices in tmpcell
Although I'd consider concatenating along dimension 3, then you can see each matrix individually and conveniently.
Alternatively, you can do the array synthesis yourself in a vectorized way, it's probably even quicker (at the cost of more memory):
A = zeros(n,n,2^m);
%// lazy person's indexing scheme:
[ind_i,ind_j,ind_k] = meshgrid(1:n,1:n,1:2^m);
A(ind_i>ind_j) = offdiags.'; %'// watch out for the transpose
%// copy to upper diagonal:
A = A + permute(A,[2 1 3]); %// n x n x 2^m matrix
%// reshape to n*[] matrix if you wish
A = reshape(A,n,[]); %// n x (n*2^m) matrix

Computing Mahalanobis Distance Between Set of Points and Set of Reference Points

I have an n x p matrix - mX which is composed of n points in R^p.
I have another m x p matrix - mY which is composed of m reference points in R^p.
I would like to create an n x m matrix - mD which is the Mahalanobis Distance matrix.
D(i, j) means the Mahalanobis Distance between point i in mX, mX(i, :) and point j in mY, mY(j, :).
Namely, is computes the following:
mD(i, j) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).';
Where mC is the given Mahalanobis Distance PSD Matrix.
It is easy to be done in a loop, is there a way to vectorize it?
Namely, is the a function which its inputs are mX, mY and mC and its output is mD and fully vectorized without using any MATLAB toolbox?
Thank You.
Approach #1
Assuming infinite resources, here's one vectorized solution using bsxfun and matrix-multiplication -
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),n,m);
Approach #2
Here's a comprise solution trying to reduce the loop complexity -
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
imC = inv(mC);
out = zeros(n*m,1);
for ii = 1:n*m
out(ii) = A(ii,:)*imC*A(ii,:).';
end
out = reshape(out,n,m);
Sample run -
>> n = 3; m = 4; p = 5;
mX = rand(n,p);
mY = rand(m,p);
mC = rand(p,p);
imC = inv(mC);
>> %// Original solution
for i = 1:n
for j = 1:m
mD(i, j) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).'; %//'
end
end
>> mD
mD =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
>> %// Approach #1
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),n,m); %//'
>> out
out =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
>> %// Approach #2
A = reshape(bsxfun(#minus,permute(mX,[1 3 2]),permute(mY,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(n*m,1);
for ii = 1:n*m
out1(ii) = A(ii,:)*imC*A(ii,:).'; %//'
end
out1 = reshape(out1,n,m);
>> out1
out1 =
-8.4256 10.032 2.8929 7.1762
-44.748 -4.3851 -13.645 -9.6702
-4.5297 3.2928 0.11132 2.5998
Instead if you had :
mD(j, i) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).';
The solutions would translate to the versions listed next.
Approach #1
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),m,n);
Approach #2
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(m*n,1);
for i = 1:n*m
out(i) = A(i,:)*imC*A(i,:).'; %//'
end
out = reshape(out,m,n);
Sample run -
>> n = 3; m = 4; p = 5;
mX = rand(n,p); mY = rand(m,p); mC = rand(p,p); imC = inv(mC);
>> %// Original solution
for i = 1:n
for j = 1:m
mD(j, i) = (mX(i, :) - mY(j, :)) * inv(mC) * (mX(i, :) - mY(j, :)).'; %//'
end
end
>> mD
mD =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
>> %// Approach #1
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
out = reshape(diag(A*inv(mC)*A.'),m,n); %//'
>> out
out =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
>> %// Approach #2
A = reshape(bsxfun(#minus,permute(mY,[1 3 2]),permute(mX,[3 1 2])),[],p);
imC = inv(mC);
out1 = zeros(m*n,1);
for i = 1:n*m
out1(i) = A(i,:)*imC*A(i,:).'; %//'
end
out1 = reshape(out1,m,n);
>> out1
out1 =
0.81755 0.33205 0.82254
1.7086 1.3363 2.4209
0.36495 0.78394 -0.33097
0.17359 0.3889 -1.0624
Here is one solution that eliminates one loop
function d = mahalanobis(mX, mY)
n = size(mX, 2);
m = size(mY, 2);
data = [mX, mY];
mc = cov(transpose(data));
dist = zeros(n,m);
for i = 1 : n
diff = repmat(mX(:,i), 1, m) - mY;
dist(i,:) = sum((mc\diff).*diff , 1);
end
d = sqrt(dist);
end
You would invoke it as:
d = mahalanobis(transpose(X),transpose(Y))
Reduce to L2
It seems that Mahalanobis Distance can be reduced to ordinary L2 distance if you are allowed to preprocess matrix mC and you are not afraid of numerical differences.
First of all, compute Cholesky decomposition of mC:
mR = chol(mC) % C = R^t * R, where R is upper-triangular
Now we can use these factors to reformulate Mahalanobis Distance:
(Xi-Yj) * inv(C) * (Xi-Yj)^t = || (Xi-Yj) inv(R) ||^2 = ||TXi - TYj||^2
where: TXi = Xi * inv(R)
TYj = Yj * inv(R)
So the idea is to transform points Xi, Yj to TXi, TYj first, and then compute euclidean distances between them. Here is the algorithm outline:
Compute mR - Cholesky factor of covariance matrix mC (takes O(p^3) time).
Invert triangular matrix mR (takes O(p^3) time).
Multiply both mX and mY by inv(mR) on the right (takes O(p^2 (m+n)) time).
Compute squared L2 distances between pairs of points (takes O(m n p) time).
Total time is O(m n p + (m + n) p^2 + p^3) versus original O(m n p^2). It should work faster when 1 << p << n,m. In such case step 4 would takes most of the time and should be vectorized.
Vectorization
I have little experience of MATLAB, but quite a lot of SIMD vectorization on x86 CPUs. In raw computations, it would be enough to vectorize along one sufficiently large array dimension, and make trivial loops for the other dimensions.
If you expect p to be large enough, it may probably be OK to vectorize along coordinates of points, and make two nested loops for i <= n and j <= m. That's similar to what #Daniel posted.
If p is not sufficiently large, you can vectorize along one of the point sequences instead. This would be similar to solution posted by #dpmcmlxxvi: you have to subtract single row of one matrix from all the rows of the second matrix, then compute squared norms of the resulting rows. Repeat n times (
or m times).
As for me, full vectorization (which means rewriting with matrix operations instead of loops in MATLAB) does not sound like a clever performance goal. Most likely partially vectorized solutions would be optimally fast.
I came to the conclusion that vectorizing this problem is not efficient. My best idea for vectorizing this problem would require m x n x p x p working memory, at least if everything is processed at once. This means with n=m=p=152 the code would already require 4GB Ram. At these dimensions, my system can run the loop in less than a second:
mD=zeros(size(mX,1),size(mY,1));
ImC=inv(mC);
for i=1:size(mX,1)
for j=1:size(mY,1)
d=mX(i, :) - mY(j, :);
mD(i, j) = (d) * ImC * (d).';
end
end

Subtracting each elements of a row vector , size (1 x n) from a matrix of size (m x n)

I have two matrices of big sizes, which are something similar to the following matrices.
m; with size 1000 by 10
n; with size 1 by 10.
I would like to subtract each element of n from all elements of m to get ten different matrices, each has size of 1000 by 10.
I started as follows
clc;clear;
nrow = 10000;
ncol = 10;
t = length(n)
for i = 1:nrow;
for j = 1:ncol;
for t = 1:length(n);
m1(i,j) = m(i,j)-n(1);
m2(i,j) = m(i,j)-n(2);
m3(i,j) = m(i,j)-n(3);
m4(i,j) = m(i,j)-n(4);
m5(i,j) = m(i,j)-n(5);
m6(i,j) = m(i,j)-n(6);
m7(i,j) = m(i,j)-n(7);
m8(i,j) = m(i,j)-n(8);
m9(i,j) = m(i,j)-n(9);
m10(i,j) = m(i,j)-n(10);
end
end
end
can any one help me how can I do it without writing the ten equations inside the loop? Or can suggest me any convenient way especially when the two matrices has many columns.
Why can't you just do this:
m01 = m - n(1);
...
m10 = m - n(10);
What do you need the loop for?
Even better:
N = length(n);
m2 = cell(N, 1);
for k = 1:N
m2{k} = m - n(k);
end
Here we go loopless:
nrow = 10000;
ncol = 10;
%example data
m = ones(nrow,ncol);
n = 1:ncol;
M = repmat(m,1,1,ncol);
N = permute( repmat(n,nrow,1,ncol) , [1 3 2] );
result = bsxfun(#minus, M, N );
%or just
result = M-N;
Elapsed time is 0.018499 seconds.
or as recommended by Luis Mendo:
M = repmat(m,1,1,ncol);
result = bsxfun(#minus, m, permute(n, [1 3 2]) );
Elapsed time is 0.000094 seconds.
please make sure that your input vectors have the same orientation like in my example, otherwise you could get in trouble. You should be able to obtain that by transposements or you have to modify this line:
permute( repmat(n,nrow,1,ncol) , [1 3 2] )
according to your needs.
You mentioned in a comment that you want to count the negative elements in each of the obtained columns:
A = result; %backup results
A(A > 0) = 0; %set non-negative elements to zero
D = sum( logical(A),3 );
which will return the desired 10000x10 matrix with quantities of negative elements. (Please verify it, I may got a little confused with the dimensions ;))
Create the three dimensional result matrix. Store your results, for example, in third dimension.
clc;clear;
nrow = 10000;
ncol = 10;
N = length(n);
resultMatrix = zeros(nrow, ncol, N);
neg = zeros(ncol, N); % amount of negative values
for j = 1:ncol
for i = 1:nrow
for t = 1:N
resultMatrix(i,j,t) = m(i,j) - n(t);
end
end
for t = 1:N
neg(j,t) = length( find(resultMatrix(:,j,t) < 0) );
end
end