Vectorize call to function of two vectors (treat matrix as array of vector) - matlab

I wish to compute the cumulative cosine distance between sets of vectors.
The natural representation of a set of vectors is a matrix...but how do I vectorize the following?
function d = cosdist(P1,P2)
ds = zeros(size(P1,2),1);
for k=1:size(P1,2)
%#used transpose() to avoid SO formatting on '
ds(k)=transpose(P1(:,k))*P2(:,k)/(norm(P1(:,k))*norm(P2(:,k)));
end
d = prod(ds);
end
I can of course write
fz = #(v1,v2) transpose(v1)*v2/(norm(v1)*norm(v2));
ds = cellfun(fz,P1,P2);
...so long as I recast my matrices as cell arrays of vectors. Is there a better / entirely numeric way?
Also, will cellfun, arrayfun, etc. take advantage of vector instructions and/or multithreading?
Note probably superfluous in present company but for column vectors v1'*v2 == dot(v1,v2) and is significantly faster in Matlab.

Since P1 and P2 are of the same size, you can do element-wise operations here. v1'*v equals sum(v1.*v2), by the way.
d = prod(sum(P1.*P2,1)./sqrt(sum(P1.^2,1) .* sum(P2.^2,1)));

#Jonas had the right idea, but the normalizing denominator might be incorrect. Try this instead:
%# matrix of column vectors
P1 = rand(5,8);
P2 = rand(5,8);
d = prod( sum(P1.*P2,1) ./ sqrt(sum(P1.^2,1).*sum(P2.^2,1)) );
You can compare this against the results returned by PDIST2 function:
%# PDIST2 returns one minus cosine distance between all pairs of vectors
d2 = prod( 1-diag(pdist2(P1',P2','cosine')) );

Related

Optimize nested for loop for calculating xcorr of matrix rows

I have 2 nested loops which do the following:
Get two rows of a matrix
Check if indices meet a condition or not
If they do: calculate xcorr between the two rows and put it into new vector
Find the index of the maximum value of sub vector and replace element of LAG matrix with this value
I dont know how I can speed this code up by vectorizing or otherwise.
b=size(data,1);
F=size(data,2);
LAG= zeros(b,b);
for i=1:b
for j=1:b
if j>i
x=data(i,:);
y=data(j,:);
d=xcorr(x,y);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(i,j)=I-1;
d=xcorr(y,x);
d=d(:,F:(2*F)-1);
[M,I] = max(d);
LAG(j,i)=I-1;
end
end
end
First, a note on floating point precision...
You mention in a comment that your data contains the integers 0, 1, and 2. You would therefore expect a cross-correlation to give integer results. However, since the calculation is being done in double-precision, there appears to be some floating-point error introduced. This error can cause the results to be ever so slightly larger or smaller than integer values.
Since your calculations involve looking for the location of the maxima, then you could get slightly different results if there are repeated maximal integer values with added precision errors. For example, let's say you expect the value 10 to be the maximum and appear in indices 2 and 4 of a vector d. You might calculate d one way and get d(2) = 10 and d(4) = 10.00000000000001, with some added precision error. The maximum would therefore be located in index 4. If you use a different method to calculate d, you might get d(2) = 10 and d(4) = 9.99999999999999, with the error going in the opposite direction, causing the maximum to be located in index 2.
The solution? Round your cross-correlation data first:
d = round(xcorr(x, y));
This will eliminate the floating-point errors and give you the integer results you expect.
Now, on to the actual solutions...
Solution 1: Non-loop option
You can pass a matrix to xcorr and it will perform the cross-correlation for every pairwise combination of columns. Using this, you can forego your loops altogether like so:
d = round(xcorr(data.'));
[~, I] = max(d(F:(2*F)-1,:), [], 1);
LAG = reshape(I-1, b, b).';
Solution 2: Improved loop option
There are limits to how large data can be for the above solution, since it will produce large intermediate and output variables that can exceed the maximum array size available. In such a case for loops may be unavoidable, but you can improve upon the for-loop solution above. Specifically, you can compute the cross-correlation once for a pair (x, y), then just flip the result for the pair (y, x):
% Loop over rows:
for row = 1:b
% Loop over upper matrix triangle:
for col = (row+1):b
% Cross-correlation for upper triangle:
d = round(xcorr(data(row, :), data(col, :)));
[~, I] = max(d(:, F:(2*F)-1));
LAG(row, col) = I-1;
% Cross-correlation for lower triangle:
d = fliplr(d);
[~, I] = max(d(:, F:(2*F)-1));
LAG(col, row) = I-1;
end
end

Vectorize the pairwise kronecker product in matlab

Suppose there are two matrices of the same size, and I want to calculate the summation of their column-wise kronecker product. Due to sometimes the column size is quite large so that the speed could be very slow. Thus, is there anyway to vectorize this function or any function may help reducing the complexity in matlab? Thanks in advance.
The corresponding matlab code with a for-loop is provided below, and the answer of d is the interested output:
A = rand(3,7);
B = rand(3,7);
d = zeros(size(A,1)*size(B,1),1);
for i=1:size(A,2)
d = d + kron(A(:,i),B(:,i));
end
Using the rewriting of the Kronecker product given by Daniels answer
e=zeros(size(B,1),size(A,1));
for i=1:size(A,2)
e = e + B(:,i)*A(:,i).';
end
e=reshape(e,[],1);
we say that
C = A'
and thus
for i=1:m
e = e + B(:,i)*C(i,:);
end
which is the definition of the matrix product
B*C.
In conclusion the problem can thus be solved by the simple matrix product
d = reshape(B*A',[],1);
The kronecker product of two vectors is just a reshaped result of the matrix multiplication of both vectors:
e=zeros(size(B,1),size(A,1));
for i=1:size(A,2)
e = e + B(:,i)*A(:,i).';
end
e=reshape(e,[],1);
Now knowing that it's just a sum of products, it can be put into a single line using bsxfun
f=reshape(sum(bsxfun(#times,permute(B,[1,3,2]),permute(A,[3,1,2])),3),[],1);
Depending on the input, the bsxfun-sulution is slightly faster than the matrix multiplication, but that comes with a high memory consumption. The bsxfun-solution uses O(size(A,1)*size(B,1)*size(B,2)) while the for-loop only uses O(size(A,1)*size(B,1)) in addition to the input arguments.

MATLAB: bsxfun unclear. Want to accelerate minimum distance between segments

Using MATLAB,
Imagine a Nx6 array of numbers which represent N segments with 3+3=6 initial and end point coordinates.
Assume I have a function Calc_Dist( Segment_1, Segment_2 ) that takes as input two 1x6 arrays, and that after some operations returns a scalar, namely the minimal euclidean distance between these two segments.
I want to calculate the pairwise minimal distance between all N segments of my list, but would like to avoid a double loop to do so.
I cannot wrap my head around the documentation of the bsxfun function of MATLAB, so I cannot make this work. For the sake of a minimal example (the distance calculation is obviously not correct):
function scalar = calc_dist( segment_1, segment_2 )
scalar = sum( segment_1 + segment_2 )
end
and the main
Segments = rand( 1500, 6 )
Pairwise_Distance_Matrix = bsxfun( #calc_dist, segments, segments' )
Is there any way to do this, or am I forced to use double loops ?
Thank you for any suggestion
I think you need pdist rather than bsxfun. pdist can be used in two different ways, the second of which is applicable to your problem:
With built-in distance functions, supplied as strings, such as 'euclidean', 'hamming' etc.
With a custom distance function, a handle to which you supply.
In the second case, the distance function
must be of the form
function D2 = distfun(XI, XJ),
taking as arguments a 1-by-N vector XI containing a single row of X, an
M2-by-N matrix XJ containing multiple rows of X, and returning an
M2-by-1 vector of distances D2, whose Jth element is the distance
between the observations XI and XJ(J,:).
Although the documentation doesn't tell, it's very likely that the second way is not as efficient as the first (a double loop might even be faster, who knows), but you can use it. You would need to define your function so that it fulfills the stated condition. With your example function it's easy: for this part you'd use bsxfun:
function scalar = calc_dist( segment_1, segment_2 )
scalar = sum(bsxfun(#plus, segment_1, segment_2), 2);
end
Note also that
pdist works with rows (not columns), which is what you need.
pdist reduces operations by exploiting the properties that any distance function must have. Namely, the distance of an element to itself is known to be zero; and the distance for each pair can be computed just once thanks to symmetry. If you want to arrange the output in the form of a matrix, use squareform.
So, after your actual distance function has been modified appropriately (which may be the hard part), use:
distances = squareform(pdist(segments, #calc_dist));
For example:
N = 4;
segments = rand(N,6);
distances = squareform(pdist(segments, #calc_dist));
produces
distances =
0 6.1492 7.0886 5.5016
6.1492 0 6.8559 5.2688
7.0886 6.8559 0 6.2082
5.5016 5.2688 6.2082 0
Unfortunately I don't see any "smarter" (i.e. read faster) solution than the double loop. For speed consideration I'd organize the points as a 6×N array, not the other way, because column access is way faster than row access in MATLAB.
So:
N = 150000;
Segments = rand(6, N);
Pairwise_Distance_Matrix = Inf(N, N);
for i = 1:(N-1)
for j = (i+1):N
Pairwise_Distance_Matrix(i,j) = calc_dist(Segments(:,i), Segments(:,j));
end;
end;
Minimum_Pairwise_Distance = min(min(Pairwise_Distance_Matrix));
Contrary to common wisdom, explicit loops are faster now in MATLAB compared to the likes of arrayfun, cellfun or structfun; bsxfun beats everything else in terms of speed, but it doesn't apply to your case.

bsxfun-like for matrix product

I need to multiply a matrix A with n matrices, and get n matrices back. For example, multiply a 2x2 matrix with 3 2x2 matrices stacked as a 2x2x3 Matlab array. bsxfun is what I usually use for such situations, but it only applies for element-wise operations.
I could do something like:
blkdiag(a, a, a) * blkdiag(b(:,:,1), b(:,:,2), b(:,:,3))
but I need a solution for arbitrary n - ?
You can reshape the stacked matrices. Suppose you have k-by-k matrix a and a stack of m k-by-k matrices sb and you want the product a*sb(:,:,ii) for ii = 1..m. Then all you need is
sza = size(a);
b = reshape( b, sza(2), [] ); % concatenate all matrices aloong the second dim
res = a * b;
res = reshape( res, sza(1), [], size(sb,3) ); % stack back to 3d
Your solution can be adapted to arbitrary size using comma-saparated lists obtained from cell arrays:
[k m n] = size(B);
Acell = mat2cell(repmat(A,[1 1 n]),k,m,ones(1,n));
Bcell = mat2cell(B,k,m,ones(1,n));
blkdiag(Acell{:}) * blkdiag(Bcell{:});
You could then stack the blocks on a 3D array using this answer, and keep only the relevant ones.
But in this case a good old loop is probably faster:
C = NaN(size(B));
for nn = 1:n
C(:,:,nn) = A * B(:,:,nn);
end
For large stacks of matrices and/or vectors over which to execute matrix multiplication, speed can start becoming an issue. To avoid re-inventing the wheel, you could simply compile and use the following fast MEX code:
MTIMESX - Mathworks.
As a rule of thumb, MATLAB is often quite inefficient at executing for loops over large numbers of operations which look like they should be vectorizable; I cannot think of a straightforward way of generalising Shai's answer to this case.

Matlab's bsxfun() code

What does this do?
u = [5 6];
s = [1 1];
data1 =[randn(10,1) -1*ones(10,1)];
data2 =[randn(10,1) ones(10,1)];
data = [data1; data2];
deviance = bsxfun(#minus,data,u);
deviance = bsxfun(#rdivide,deviance,s);
deviance = deviance .^ 2;
deviance = bsxfun(#plus,deviance,2*log(abs(s)));
[dummy,mini] = min(deviance,[],2);
Is there an equivalent way of doing this without bsxfun?
The function BSXFUN will perform the requested element-wise operation (function handle argument) by replicating dimensions of the two input arguments so that they match each other in size. You can avoid the use of BSXFUN in this case by replicating the variables u and s yourself using the function REPMAT to make them each the same size as data. Then you can use the standard element-wise arithmetic operators:
u = repmat(u,size(data,1),1); %# Replicate u so it becomes a 20-by-2 array
s = repmat(s,size(data,1),1); %# Replicate s so it becomes a 20-by-2 array
deviance = ((data-u)./s).^2 + 2.*log(abs(s)); %# Shortened to one line
bsxfun does binary operations element wise. It's useful when you need to subtract a vector (in this case u) from all the elements along a particular dimension in a matrix (in this case data). The dimension along which the operation is being performed must match in both cases. For your example, you can incorporate the code without bsxfun as
u1=repmat(u,size(data,2),1);
deviance=data-u1;
and so on for the other operations.