Calculate distance matrix for 3D points - matlab

I have the lists xA, yA, zA and the lists xB, yB, zB in Matlab. The contain the the x,y and z coordinates of points of type A and type B. There may be different numbers of type A and type B points.
I would like to calculate a matrix containing the Euclidean distances between points of type A and type B. Of course, I only need to calculate one half of the matrix, since the other half contains duplicate data.
What is the most efficient way to do that?
When I'm done with that, I want to find the points of type B that are closest to one point of type A. How do I then find the coordinates the closest, second closest, third closest and so on points of type B?

Given a matrix A of size [N,3] and a matrix B of size [M,3], you can use the pdist2 function to get a matrix of size [M,N] containing all the pairwise distances.
If you want to order the points from B by their distance to the rth point in A then you can sort the rth row of the pairwise distance matrix.
% generate some example data
N = 4
M = 7
A = randn(N,3)
B = randn(M,3)
% compute N x M matrix containing pairwise distances
D = pdist2(A, B, 'euclidean')
% sort points in B by their distance to the rth point in A
r = 3
[~, b_idx] = sort(D(r,:))
Then b_idx will contain the indices of the points in B sorted by their distance to the rth point in A. So the actual points in B ordered by b_idx can be obtained with B(b_idx,:), which has the same size as B.
If you want to do this for all r you could use
[~, B_idx] = sort(D, 1)
to sort all the rows of D at the same time. Then the rth row of B_idx will contain b_idx.
If your only objective is to find the closest k points in B for each point in A (for some positive integer k which is less than M), then we would not generally want to compute all the pairwise distances. This is because space-partitioning data structures like K-D-trees can be used to improve the efficiency of searching without explicitly computing all the pairwise distances.
Matlab provides a knnsearch function that uses K-D-trees for this exact purpose. For example, if we do
k = 2
B_kidx = knnsearch(B, A, 'K', k)
then B_kidx will be the first two columns of B_idx, i.e. for each point in A the indices of the nearest two points in B. Also, note that this is only going to be more efficient than the pdist2 method when k is relatively small. If k is too large then knnsearch will automatically use the explicit method from before instead of the K-D-tree approach.

Related

Matlab calculating nearest neighbour distance for all (u, v) vectors in an array

I am trying to calculate the distance between nearest neighbours within a nx2 matrix like the one shown below
point_coordinates =
11.4179 103.1400
16.7710 10.6691
16.6068 119.7024
25.1379 74.3382
30.3651 23.2635
31.7231 105.9109
31.8653 36.9388
%for loop going from the top of the vector column to the bottom
for counter = 1:size(point_coordinates,1)
%current point defined selected
current_point = point_coordinates(counter,:);
%math to calculate distance between the current point and all the points
distance_search= point_coordinates-repmat(current_point,[size(point_coordinates,1) 1]);
dist_from_current_point = sqrt(distance_search(:,1).^2+distance_search(:,2).^2);
%line to omit self subtraction that gives zero
dist_from_current_point (dist_from_current_point <= 0)=[];
%gives the shortest distance calculated for a certain vector and current_point
nearest_dist=min(dist_from_current_point);
end
%final line to plot the u,v vectors and the corresponding nearest neighbour
%distances
matnndist = [point_coordinates nearest_dist]
I am not sure how to structure the 'for' loop/nearest_neighbour line to be able to get the nearest neighbour distance for each u,v vector.
I would like to have, for example ;
for the first vector you could have the coordinates and the corresponding shortest distance, for the second vector another its shortest distance, and this goes on till n
Hope someone can help.
Thanks
I understand you want to obtain the minimum distance between different points.
You can compute the distance for each pair of points with bsxfun; remove self-distances; minimize. It's more computationally efficient to work with squared distances, and take the square root only at the end.
n = size(point_coordinates,1);
dist = bsxfun(#minus, point_coordinates(:,1), point_coordinates(:,1).').^2 + ...
bsxfun(#minus, point_coordinates(:,2), point_coordinates(:,2).').^2;
dist(1:n+1:end) = inf; %// remove self-distances
min_dist = sqrt(min(dist(:)));
Alternatively, you could use pdist. This avoids computing each distance twice, and also avoids self-distances:
dist = pdist(point_coordinates);
min_dist = min(dist(:));
If I can suggest a built-in function, use knnsearch from the statistics toolbox. What you are essentially doing is a K-Nearest Neighbour (KNN) algorithm, but you are ignoring self-distances. The way you would call knnsearch is in the following way:
[idx,d] = knnsearch(X, Y, 'k', k);
In simple terms, the KNN algorithm returns the k closest points to your data set given a query point. Usually, the Euclidean distance is the distance metric that is used. For MATLAB's knnsearch, X is a 2D array that consists of your dataset where each row is an observation and each column is a variable. Y would be the query points. Y is also a 2D array where each row is a query point and you need to have the same number of columns as X. We would also specify the flag 'k' to denote how many closest points you want returned. By default, k = 1.
As such, idx would be a N x K matrix, where N is the total number of query points (number of rows of Y) and K would be those k closest points to the dataset for each query point we have. idx indicates the particular points in your dataset that were closest to each query. d is also a N x K matrix that returns the smallest distances for these corresponding closest points.
As such, what you want to do is find the closest point for your dataset to each of the other points, ignoring self-distances. Therefore, you would set both X and Y to be the same, and set k = 2, discarding the first column of both outputs to get the result you're looking for.
Therefore:
[idx,d] = knnsearch(point_coordinates, point_coordinates, 'k', 2)
idx = idx(:,2);
d = d(:,2);
We thus get for idx and d:
>> idx
idx =
3
5
1
1
7
3
5
>> d
d =
17.3562
18.5316
17.3562
31.9027
13.7573
20.4624
13.7573
As such, this tells us that for the first point in your data set, it matched with point #3 the best. This matched with the closest distance of 17.3562. For the second point in your data set, it matched with point #5 the best with the closest distance being 18.5316. You can continue on with the rest of the results in a similar pattern.
If you don't have access to the statistics toolbox, consider reading my StackOverflow post on how I compute KNN from first principles.
Finding K-nearest neighbors and its implementation
In fact, it is very similar to Luis Mendo's post to you earlier.
Good luck!

Find the minimum difference between any pair of elements between two vectors

Which of the following statements will find the minimum difference between any pair of elements (a,b) where a is from the vector A and b is from the vector B.
A. [X,Y] = meshgrid(A,B);
min(abs(X-Y))
B. [X,Y] = meshgrid(A,B);
min(abs(min(Y-X)))
C. min(abs(A-B))
D. [X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
Can someone please explain to me?
By saying "minimum difference between any pair of elements(a,b)", I presume you mean that you are treating A and B as sets and you intend to find the absolute difference in any possible pair of elements from these two sets. So in this case you should use your option D
[X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
Explanation: Meshgrid turns a pair of 1-D vectors into 2-D grids. This link can explain what I mean to say:
http://www.mathworks.com/help/matlab/ref/meshgrid.html?s_tid=gn_loc_drop
Hence (X-Y) will give the difference in all possible pairs (a,b) such that a belongs to A and b belongs to B. Note that this will be a 2-D matrix.
abs(X-Y) would return the absolute values of all elements in this matrix (the absolute difference in each pair).
To find the smallest element in this matrix you will have to use min(min(abs(X-Y))). This is because if Z is a matrix, min(Z) treats the columns of Z as vectors, returning a row vector containing the minimum element from each column. So a single min command will give a row vector with each element being the min of the elements of that column. Using min for a second time returns the min of this row vector. This would be the smallest element in the entire matrix.
This can help:
http://www.mathworks.com/help/matlab/ref/min.html?searchHighlight=min
Options C is correct if you treat A and B as vectors and not sets. In this case you won't be considering all possible pairs. You'll end up finding the minimum of (a-b) where a,b are both in the same position in their corresponding vectors (pair-wise difference).
D. [X,Y] = meshgrid(A,B);
min(min(abs(X-Y)))
meshgrid will generate two grids - X and Y - from the vectors, which are arranged so that X-Y will generate all combinations of ax-bx where ax is in a and bx is in b.
The rest of the expression just gets the minimum absolute value from the array resulting from the subtraction, which is the value you want.
CORRECT ANSWER IS D
Let m = size(A) and n = size(B)
You want to subtract each pair of (a,b) such that a is from vector A and b is from vector B.
meshgrid(A,B) creates two matrices X Y both of size nxm where X have rows sames have vector A while Yhas columns same as vector B .
Hence , Z = X-Y will give you a matrix with n*m values corresponding to the difference between each pair of values taken from A and B . Now all you have to do is to find the absolute minimum among all values of Z.
You can do that by
req_min = min(min(abs(z)))
The whole code is
[X Y ] = meshgrid(A,B);
Z= X-Y;
Z = abs(Z);
req_min = min(min(Z));
You could also use bsxfun instead of meshgrid:
min(min(abs(bsxfun(#minus, A(:), B(:).'))))
Or use pdist2:
min(min(pdist2(A(:),B(:))))

Euclidean distance between two columns of two vector Matlab

I have two vectors A & B of size 250x4. The first column in each vector has the X values and the second column has the Y values. I want to calculate the euclidean distance between each the X & Y of each row in the two vectors and save the result in a new vector C of size 250x1 which holds the result of the euclidean distance. For example, if the first row in A is A1x, A1y, A1n, A1m and the first row in B is B1x, B1y, B1n, B1m so I want to get the eucledian distance which will be [(A1x-B1x)^2 + (A1y-B1y)^2]^0.5 and the result will be saved in C1 and same will be done for the rest of the 250 rows. So if anyone could please advise how to do this in Matlab.
Like this:
%// First extract on x-y data from A and B
Axy = A(:,1:2);
Bxy = B(:,1:2);
%// Find all euclidean distances (row-wise)
C1 = sqrt(sum((Axy-Bxy).^2,2));
plus it handles higher dimension too
use pdist2:
C1=diag(pdist2(A(:,1:2),B(:,1:2)));
Actually, pdist2 will give you a 250x250 matrix, because it calculate all the distances. You need only the main diagonal, so calling diag on the result (as in the code above) will produce the wanted result.

Distance matrix calculation

I hope you will have the right answer to this question, which is rather challanging.
I have two uni-dimensional vectors Y and Z, which hold the coordinates of N grid points located on a square grid. So
Ny points along Y
Nz points along Z
N = Ny*Nz
Y = Y[N]; (Y holds N entries)
Z = Z[N]; (Z holds N entries)
Now, the goal would be generating the distance matrix D, which holds N*N entries: so each row of matrix D is defined as the distance between the i-th point on the grid and the remnant (N - i) points.
Generally, to compute the whole matrix I would call
D = squareform(pdist([Y Z]));
or,equivalently,
D = pdist2([Y Z],[Y Z]).
But, since D is a symmetric matrix, I'd like to generate only the N(N + 1)/2 independent entries and store them into a row-ordered vector Dd.
So the question is: how to generate a row-ordered array Dd whose entries are defined by the lower triangular terms of matrix D? I'd, furthermore, like storing the entries in a column-major order.
I hope the explanation is clear enough.
As woodchips commented, it is simpler and faster to compute the whole matrix and extract the elements you care about. Here's a way to do it:
ndx = find(tril(true(size(D))));
Dd = D(ndx);
If you absolutely must compute elements of Dd without computing matrix D first, you probably need a double for loop.

Comparing two sets of vectors

I've got matrices A and B
size(A) = [n x]; size(B) = [n y];
Now I need to compare euclidian distance of each column vector of A from each column vector of B. I'm using dist method right now
Q = dist([A B]); Q = Q(1:x, x:end);
But it does also lot of needless work (like calculating distances between vectors of A and B separately).
What is the best way to calculate this?
You are looking for pdist2.
% Compute the ordinary Euclidean distance
D = pdist2(A.',B.','euclidean'); % euclidean distance
You should take the transpose of the matrices since pdist2 assumes the observations are in rows, not in columns.
An alternative solution to pdist2, if you don't have the Statistics Toolbox, is to compute this manually. For example, one way to do it is:
[X, Y] = meshgrid(1:size(A, 2), 1:size(B, 2)); %// or meshgrid(1:x, 1:y)
Q = sqrt(sum((A(:, X(:)) - B(:, Y(:))) .^ 2, 1));
The indices of the columns from A and B for each value in vector Q can be obtained by computing:
[X(:), Y(:)]
where each row contains a pair of indices: the first is the column index in matrix A, and the second is the column index in matrix B.
Another solution if you don't have pdist2 and which may also be faster for very large matrices is to vectorize the following mathematical fact:
||x-y||^2 = ||x||^2 + ||y||^2 - 2*dot(x,y)
where ||a|| is the L2-norm (euclidean norm) of a.
Comments:
C=-2*A'*B (this is a x by y matrix) is the vectorization of the dot products.
||x-y||^2 is the square of the euclidean distance which you are looking for.
Is that enough or do you need the explicit code?
The reason this may be faster asymptotically is that you avoid doing the metric calculation for all x*y comparisons, since you are instead making the bottleneck a matrix multiplication (matrix multiplication is highly optimized in matlab). You are taking advantage of the fact that this is the euclidean distance and not just some unknown metric.