Finding maximum/minimum distance of two rows in a matrix using MATLAB - matlab

Say we have a matrix m x n where the number of rows of the matrix is very big. If we assume each row is a vector, then how could one find the maximum/minimum distance between vectors in this matrix?

My suggestion would be to use pdist. This computes pairs of Euclidean distances between unique combinations of observations like #seb has suggested, but this is already built into MATLAB. Your matrix is already formatted nicely for pdist where each row is an observation and each column is a variable.
Once you do apply pdist, apply squareform so that you can display the distance between pairwise entries in a more pleasant matrix form. The (i,j) entry for each value in this matrix tells you the distance between the ith and jth row. Also note that this matrix will be symmetric and the distances along the diagonal will inevitably equal to 0, as any vector's distance to itself must be zero. If your minimum distance between two different vectors were zero, if we were to search this matrix, then it may possibly report a self-distance instead of the actual distance between two different vectors. As such, in this matrix, you should set the diagonals of this matrix to NaN to avoid outputting these.
As such, assuming your matrix is A, all you have to do is this:
distValues = pdist(A); %// Compute pairwise distances
minDist = min(distValues); %// Find minimum distance
maxDist = max(distValues); %// Find maximum distance
distMatrix = squareform(distValues); %// Prettify
distMatrix(logical(eye(size(distMatrix)))) = NaN; %// Ignore self-distances
[minI,minJ] = find(distMatrix == minDist, 1); %// Find the two vectors with min. distance
[maxI,maxJ] = find(distMatrix == maxDist, 1); %// Find the two vectors with max. distance
minI, minJ, maxI, maxJ will return the two rows of A that produced the smallest distance and the largest distance respectively. Note that with the find statement, I have made the second parameter 1 so that it only returns one pair of vectors that have this minimum / maximum distance between each other. However, if you omit this parameter, then it will return all possible pairs of rows that share this same distance, but you will get duplicate entries as the squareform is symmetric. If you want to escape the duplication, set either the upper triangular half, or lower triangular half of your squareform matrix to NaN to tell MATLAB to skip searching in these duplicated areas. You can use MATLAB's tril or triu commands to do that. Take note that either of these methods by default will include the diagonal of the matrix and so there won't be any extra work here. As such, try something like:
distValues = pdist(A); %// Compute pairwise distances
minDist = min(distValues); %// Find minimum distance
maxDist = max(distValues); %// Find maximum distance
distMatrix = squareform(distValues); %// Prettify
distMatrix(triu(true(size(distMatrix)))) = NaN; %// To avoid searching for duplicates
[minI,minJ] = find(distMatrix == minDist); %// Find pairs of vectors with min. distance
[maxI,maxJ] = find(distMatrix == maxDist); %// Find pairs of vectors with max. distance
Judging from your application, you just want to find one such occurrence only, so let's leave it at that, but I'll put that here for you in case you need it.

You mean the max/min distance between any 2 rows? If so, you can try that:
numRows = 6;
A = randn(numRows, 100); %// Example of input matrix
%// Compute distances between each combination of 2 rows
T = nchoosek(1:numRows,2); %// pairs of indexes for all combinations of 2 rows
for k=1:length(T)
d(k) = norm(A(T(k,1),:)-A(T(k,2),:));
end
%// Find min/max distance
[~, minIndex] = min(d);
[~, maxIndex] = max(d);
T(minIndex,:) %// Displays indexes of the 2 rows with minimum distance
T(maxIndex,:) %// Displays indexes of the 2 rows with maximum distance

Related

Calculate Euclidean distance for every row with every other row in a NxM matrix?

I have a matrix that I generate from a CSV file as follows:
X = xlsread('filename.csv');
I am looping through the matrix based on the number of records and I need to find the Euclidean distance for each of the rows of this matrix :
for i = 1:length(X)
j = X(:, [2:5])
end
The resulting matrix is of 150 X 4. What would be the best way to calculate the Euclidean distance of each row (with 4 columns as the data points) with every row and getting an average of the same?
In order to find the Euclidean distance between any pair of rows, you could use the function pdist.
X = randn(6, 4);
D = pdist(X,'euclidean');
res=mean(D);
The average is stored in res.

Distance between any combination of two points

I have 100 coordinates in a variable x in MATLAB . How can I make sure that distance between all combinations of two points is greater than 1?
You can do this in just one simple line, with the functions all and pdist:
if all(pdist(x)>1)
...
end
Best,
First you'll need to generate a matrix that gives you all possible pairs of coordinates. This post can serve as inspiration:
Generate a matrix containing all combinations of elements taken from n vectors
I'm going to assume that your coordinates are stored such that the columns denote the dimensionality and the rows denote how many points you have. As such, for 2D, you would have a 100 x 2 matrix, and in 3D you would have a 100 x 3 matrix and so on.
Once you generate all possible combinations, you simply compute the distance... which I will assume it to be Euclidean here... of all points and ensure that all of them are greater than 1.
As such:
%// Taken from the linked post
vectors = { 1:100, 1:100 }; %// input data: cell array of vectors
n = numel(vectors); %// number of vectors
combs = cell(1,n); %// pre-define to generate comma-separated list
[combs{end:-1:1}] = ndgrid(vectors{end:-1:1}); %// the reverse order in these two
%// comma-separated lists is needed to produce the rows of the result matrix in
%// lexicographical order
combs = cat(n+1, combs{:}); %// concat the n n-dim arrays along dimension n+1
combs = reshape(combs,[],n); %// reshape to obtain desired matrix
%// Index into your coordinates array
source_points = x(combs(:,1), :);
end_points = x(combs(:,2), :);
%// Checks to see if all distances are greater than 1
is_separated = all(sqrt(sum((source_points - end_points).^2, 2)) > 1);
is_separated will contain either 1 if all points are separated by a distance of 1 or greater and 0 otherwise. If we dissect the last line of code, it's a three step procedure:
sum((source_points - end_points).^2, 2) computes the pairwise differences between each component for each pair of points, squares the differences and then sums all of the values together.
sqrt(...(1)...) computes the square root which gives us the Euclidean distance.
all(...(2)... > 1) then checks to see if all of the distances computed in Step #2 were greater than 1 and our result thus follows.

How to create matrix of nearest neighbours from dataset using matrix of indices - matlab

I have an Nx2 matrix of data points where each row is a data point. I also have an NxK matrix of indices of the K nearest neighbours from the knnsearch function. I am trying to create a matrix that contains in each row the data point followed by the K neighbouring data points, i.e. for K = 2 we would have something like [data1, neighbour1, neighbour2] for each row.
I have been messing round with loops and attempting to index with matrices but to no avail, the fact that each datapoint is 1x2 is confusing me.
My ultimate aim is to calculate gradients to train an RBF network in a similar manner to:
D = (x_dist - y_dist)./(y_dist+(y_dist==0));
temp = y';
neg_gradient = -2.*sum(kron(D, ones(1,2)) .* ...
(repmat(y, 1, ndata) - repmat((temp(:))', ndata, 1)), 1);
neg_gradient = (reshape(neg_gradient, net.nout, ndata))';
You could use something along those lines:
K = 2;
nearest = knnsearch(data, data, 'K', K+1);%// Gets point itself and K nearest ones
mat = reshape(data(nearest.',:).',[],N).'; %// Extracts the coordinates
We generate data(nearest.',:) to get a 3*N-by-2 matrix, where every 3 consecutive rows are the points that correspond to each other. We transpose this to get the xy-coordinates into the same column. (MATLAB is column major, i.e. values in a column are stored consecutively). Then we reshape the data, so every column contains the xy-coordinates of the rows of nearest. So we only need to transpose once more in the end.

Matlab calculating nearest neighbour distance for all (u, v) vectors in an array

I am trying to calculate the distance between nearest neighbours within a nx2 matrix like the one shown below
point_coordinates =
11.4179 103.1400
16.7710 10.6691
16.6068 119.7024
25.1379 74.3382
30.3651 23.2635
31.7231 105.9109
31.8653 36.9388
%for loop going from the top of the vector column to the bottom
for counter = 1:size(point_coordinates,1)
%current point defined selected
current_point = point_coordinates(counter,:);
%math to calculate distance between the current point and all the points
distance_search= point_coordinates-repmat(current_point,[size(point_coordinates,1) 1]);
dist_from_current_point = sqrt(distance_search(:,1).^2+distance_search(:,2).^2);
%line to omit self subtraction that gives zero
dist_from_current_point (dist_from_current_point <= 0)=[];
%gives the shortest distance calculated for a certain vector and current_point
nearest_dist=min(dist_from_current_point);
end
%final line to plot the u,v vectors and the corresponding nearest neighbour
%distances
matnndist = [point_coordinates nearest_dist]
I am not sure how to structure the 'for' loop/nearest_neighbour line to be able to get the nearest neighbour distance for each u,v vector.
I would like to have, for example ;
for the first vector you could have the coordinates and the corresponding shortest distance, for the second vector another its shortest distance, and this goes on till n
Hope someone can help.
Thanks
I understand you want to obtain the minimum distance between different points.
You can compute the distance for each pair of points with bsxfun; remove self-distances; minimize. It's more computationally efficient to work with squared distances, and take the square root only at the end.
n = size(point_coordinates,1);
dist = bsxfun(#minus, point_coordinates(:,1), point_coordinates(:,1).').^2 + ...
bsxfun(#minus, point_coordinates(:,2), point_coordinates(:,2).').^2;
dist(1:n+1:end) = inf; %// remove self-distances
min_dist = sqrt(min(dist(:)));
Alternatively, you could use pdist. This avoids computing each distance twice, and also avoids self-distances:
dist = pdist(point_coordinates);
min_dist = min(dist(:));
If I can suggest a built-in function, use knnsearch from the statistics toolbox. What you are essentially doing is a K-Nearest Neighbour (KNN) algorithm, but you are ignoring self-distances. The way you would call knnsearch is in the following way:
[idx,d] = knnsearch(X, Y, 'k', k);
In simple terms, the KNN algorithm returns the k closest points to your data set given a query point. Usually, the Euclidean distance is the distance metric that is used. For MATLAB's knnsearch, X is a 2D array that consists of your dataset where each row is an observation and each column is a variable. Y would be the query points. Y is also a 2D array where each row is a query point and you need to have the same number of columns as X. We would also specify the flag 'k' to denote how many closest points you want returned. By default, k = 1.
As such, idx would be a N x K matrix, where N is the total number of query points (number of rows of Y) and K would be those k closest points to the dataset for each query point we have. idx indicates the particular points in your dataset that were closest to each query. d is also a N x K matrix that returns the smallest distances for these corresponding closest points.
As such, what you want to do is find the closest point for your dataset to each of the other points, ignoring self-distances. Therefore, you would set both X and Y to be the same, and set k = 2, discarding the first column of both outputs to get the result you're looking for.
Therefore:
[idx,d] = knnsearch(point_coordinates, point_coordinates, 'k', 2)
idx = idx(:,2);
d = d(:,2);
We thus get for idx and d:
>> idx
idx =
3
5
1
1
7
3
5
>> d
d =
17.3562
18.5316
17.3562
31.9027
13.7573
20.4624
13.7573
As such, this tells us that for the first point in your data set, it matched with point #3 the best. This matched with the closest distance of 17.3562. For the second point in your data set, it matched with point #5 the best with the closest distance being 18.5316. You can continue on with the rest of the results in a similar pattern.
If you don't have access to the statistics toolbox, consider reading my StackOverflow post on how I compute KNN from first principles.
Finding K-nearest neighbors and its implementation
In fact, it is very similar to Luis Mendo's post to you earlier.
Good luck!

Calculate Euclidean distance between RGB vectors in a large matrix

I have this RGB matrix of a set of different pixels. (N pixels => n rows, RGB => 3 columns). I have to calculate the minimum RGB distance between any two pixels from this matrix. I tried the loop approach, but because the set is too big (let's say N=24000), it looks like it will take forever for the program to finish. Is there another approach? I read about pdist, but the RGB Euclidean distance cannot be used with it.
k=1;
for i = 1:N
for j = 1:N
if (i~=j)
dist_vect(k)=RGB_dist(U(i,1),U(j,1),U(i,2),U(j,2),U(i,3),U(j,3))
k=k+1;
end
end
end
Euclidean distance between two pixels:
So, pdist syntax would be like this: D=pdist2(U,U,#calc_distance());, where U is obtained like this:
rgbImage = imread('peppers.png');
rgb_columns = reshape(rgbImage, [], 3)
[U, m, n] = unique(rgb_columns, 'rows','stable');
But if pdist2 does the loops itself, how should I enter the parameters for my function?
function[distance]=RGB_dist(R1, R2, G1, G2, B1, B2),
where R1,G1,B1,R2,G2,B2 are the components of each pixel.
I made a new function like this:
function[distance]=RGB_dist(x,y)
distance=sqrt(sum(((x-y)*[3;4;2]).^2,2));
end
and I called it D=pdist(U,U,#RGB_dist); and I got 'Error using pdist (line 132)
The 'DISTANCE' argument must be a
string or a function.'
Testing RGB_dist new function alone, with these input set
x=[62,29,64;
63,31,62;
65,29,60;
63,29,62;
63,31,62;];
d=RGB_dist(x,x);
disp(d);
outputs only values of 0.
Contrary to what your post says, you can use the Euclidean distance as part of pdist. You have to specify it as a flag when you call pdist.
The loop you have described above can simply be computed by:
dist_vect = pdist(U, 'euclidean');
This should compute the L2 norm between each unique pair of rows. Seeing that your matrix has a RGB pixel per row, and each column represents a single channel, pdist should totally be fine for your application.
If you want to display this as a distance matrix, where row i and column j corresponds to the distance between a pixel in row i and row j of your matrix U, you can use squareform.
dist_matrix = squareform(dist_vect);
As an additional bonus, if you want to find which two pixels in your matrix share the smallest distance, you can simply do a find search on the lower triangular half of dist_matrix. The diagonals of dist_matrix are going to be all zero as any vector whose distance to itself should be 0. In addition, this matrix is symmetric and so the upper triangular half should be equal to the lower triangular half. Therefore, we can set the diagonal and the upper triangular half to Inf, then search for the minimum for those elements that are remaining. In other words:
indices_to_set = true(size(dist_matrix));
indices_to_set = triu(indices_to_set);
dist_matrix(indices_to_set) = Inf;
[v1,v2] = find(dist_matrix == min(dist_matrix(:)), 1);
v1 and v2 will thus contain the rows of U where those RGB pixels contained the smallest Euclidean distance. Note that we specify the second parameter as 1 as we want to find just one match, as what your post has stated as a requirement. If you wish to find all vectors who match the same distance, simply remove the second parameter 1.
Edit - June 25th, 2014
Seeing as how you want to weight each component of the Euclidean distance, you can define your own custom function to calculate distances between two RGB pixels. As such, instead of specifying euclidean, you can specify your own function which can calculate the distances between two vectors within your matrix by calling pdist like so:
pdist(x, #(XI,XJ) ...);
#(XI,XJ)... is an anonymous function that takes in a vector XI and a matrix XJ. For pdist you need to make sure that the custom distance function takes in XI as a 1 x N vector which is a single row of pixels. XJ is then a M x N matrix that contains multiple rows of pixels. As such, this function needs to return a M x 1 vector of distances. Therefore, we can achieve your weighted Euclidean distance as so:
weights = [3;4;2];
weuc = #(XI, XJ, W) sqrt(bsxfun(#minus, XI, XJ).^2 * W);
dist_matrix = pdist(double(U), #(XI, XJ) weuc(XI, XJ, weights));
bsxfun can handle that nicely as it will replicate XI for as many rows as we need to, and it should compute this single vector with every single element in XJ by subtracting. We thus square each of the differences, weight by weights, then take the square root and sum. Note that I didn't use sum(X,2), but I used vector algebra to compute the sum. If you recall, we are simply computing the dot product between the square distance of each component with a weight. In other words, x^{T}y where x is the square distance of each component and y are the weights for each component. You could do sum(X,2) if you like, but I find this to be more elegant and easy to read... plus it's less code!
Now that I know how you're obtaining U, the type is uint8 so you need to cast the image to double before we do anything. This should achieve your weighted Euclidean distance as we talked about.
As a check, let's put in your matrix in your example, then run it through pdist then squareform
x=[62,29,64;
63,31,62;
65,29,60;
63,29,62;
63,31,62];
weights = [3;4;2];
weuc = #(XI, XJ, W) sqrt(bsxfun(#minus,XI,XJ).^2 * W);
%// Make sure you CAST TO DOUBLE, as your image is uint8
%// We don't have to do it here as x is already a double, but
%// I would like to remind you to do so!
dist_vector = pdist(double(x), #(XI, XJ) weuc(XI, XJ, weights));
dist_matrix = squareform(dist_vector)
dist_matrix =
0 5.1962 7.6811 3.3166 5.1962
5.1962 0 6.0000 4.0000 0
7.6811 6.0000 0 4.4721 6.0000
3.3166 4.0000 4.4721 0 4.0000
5.1962 0 6.0000 4.0000 0
As you can see, the distance between pixels 1 and 2 is 5.1962. To check, sqrt(3*(63-62)^2 + 4*(31-29)^2 + 2*(64-62)^2) = sqrt(3 + 16 + 8) = sqrt(27) = 5.1962. You can do similar checks among elements within this matrix. We can tell that the distance between pixels 5 and 2 is 0 as you have made these rows of pixels the same. Also, the distance between each of themselves is also 0 (along the diagonal). Cool!