I have a vector of 3D points lets say A as shown below,
A=[
-0.240265581092000 0.0500598627544876 1.20715641293013
-0.344503191645519 0.390376667574812 1.15887540716612
-0.0931248606994074 0.267137193112796 1.24244644549763
-0.183530493218807 0.384249186312578 1.14512014134276
-0.0201358671977785 0.404732019283683 1.21816745283019
-0.242108038906952 0.229873488902244 1.24229940627651
-0.391349107031230 0.262170158259873 1.23856838565023
]
what I want to do is to connect 3D points with lines which only have distance less than a specific threshold T. I want to get a list of pairs of points needed to be connected. Such as,
[
( -0.240265581092000 0.0500598627544876 1.20715641293013), (-0.344503191645519 0.390376667574812 1.15887540716612);
(-0.0931248606994074 0.267137193112796 1.24244644549763),(-0.183530493218807 0.384249186312578 1.14512014134276),.....
]
So as shown, I'll have a vector of pairs of points needed to be connected. So if anyone could please advise how this can be done in Matlab.
The following example demonstrates how to accomplish this.
%# Build an example matrix
A = [1 2 3; 0 0 0; 3 1 3; 2 0 2; 0 1 0];
Threshold = 3;
%# Calculate distance between all points
D = pdist2(A, A);
%# Discard any points with distance greater than threshold
D(D > Threshold) = nan;
If you wish to extract an index of all observation pairs that are linked by a distance less than (or equal to) Threshold, as well as the corresponding distance (your question didn't specify what form you wanted the output to take, so I am essentially guessing here), then instead use the following:
%# Obtain a list of linear indices of observations less than or equal to TH
I1 = find(D <= Threshold);
%#Extract the actual distances, as well as the corresponding observation indices from A
[Obs1Index, Obs2Index] = ind2sub(size(D), I1);
DList = [Obs1Index, Obs2Index, D(I1)];
Note, pdist2 uses Euclidean distance by default, but there are other options - see the documentation here.
UPDATE: Based on the OP's comments, the following code will express the output as a K*6 matrix, where K is the number of distance measures less than the threshold value, and the first three columns of each row is the first data point (3 dimensions) and the second three columns of each row is the connected data point.
DList2 = [A(Obs1Index, :), A(Obs2Index, :)];
SECOND UPDATE: I have not made any assumptions on the distance measure in this answer. That is, I'm deliberately using pdist2 in case your distance measure is not symmetric. However, if you are using a symmetric distance measure, then you could probably speed up the run-time by using pdist instead, although my indexing code would need to be adjusted accordingly.
Plot3 and pdist2 can be used to achieve what you want.
D=pdist2(A,A);
T=0.2;
for i=1:7
for j=i+1:7
if D(i,j)<T & D(i,j)~=0
i
j
plot3(A([i j],1),A([i j],2),A([i j],3));
hold on;
fprintf('line is plotted\n');
pause;
end
end
end
Related
I am still wrapping my head around vectorization and I'm having a difficult time trying to resolve the following function I made...
for i = 1:size(X, 1)
min_n = inf;
for j=1:K
val = X(i,:)' - centroids(j,:)';
diff = val'*val;
if (diff < min_n)
idx(i) = j;
min_n = diff;
end
end
end
X is an array of (x,y) coordinates...
2 5
5 6
...
...
centroids in this example is limited to 3 rows. It is also in (x,y) format as shown above.
For every pair in X I am computing the closest pair of centroids. I then store the index of the centroid in idx.
So idx(i) = j means that I am storing the index j of the centroid at index i, where i corresponds to the index of X. This means the closest centroid to pair X(i, :) is at idx(i).
Can I possibly simplify this via vectorization? I struggle with just vectorizing the inner loop.
Here are three options. But please note that the disadvantage of vectorization, as compared to your double loops, is that it stores all the difference operation results at once, which means that if your matrices have many rows, you might run out of memory. On the other hand, the vectorized approach is probably much faster.
Option 1
If you have access to Statistics and Machine Learning Toolbox, you can use the function pdist2 to get all the pairwise distances between rows of two matrices. Then, the min function gives you the minimum of each column of the result. Its first returned value are the minimal values, and its second are the indices, which is what you need for idx:
diff = pdist2(centroids,X);
[~,idx] = min(diff);
Option 2
If you don't have access to the toolbox, you can use bsxfun. This will let you compute the difference operation between the two matrices even if their dimensions don't agree. All you need to do is to use shiftdim to reshape X' to have size [1,size(X,2),size(X,1)], and then reshapedX and and centroids are compatible with their dimensions (see documentation of bsxfun). This lets you take the difference between their values. The result is a three dimensional array, which you need to sum along the second dimension to get the norm of the differences between rows. At this point you can proceed as in option 1.
reshapedX = shiftdim(X',-1);
diff = bsxfun(#minus,centroids,reshapedX);
diff = squeeze(sum(diff.^2,2));
[~,idx] = min(diff);
Note: Starting in the Matlab version 2016b, the bsxfun is used implicitly and you do not need to call it anymore. So the line with bsxfun can be replaced with the simpler line diff = centroids-reshapedX.
Option 3
Use the function dsearchn, which performs exactly what you need:
idx = dsearchn(centroids,X);
it could be done using pdist2 - pairwise distances between rows of two matrices:
% random data
X = rand(500,2);
centroids = rand(3,2);
% pairwise distances
D = pdist2(X,centroids);
% closest centroid index for each X coordinates
[~,idx] = min(D,[],2)
% plot
scatter(centroids(:,1),centroids(:,2),300,(1:size(centroids,1))','filled');
hold on;
scatter(X(:,1),X(:,2),30,idx);
legend('Centroids','data');
I am trying to calculate the distance between nearest neighbours within a nx2 matrix like the one shown below
point_coordinates =
11.4179 103.1400
16.7710 10.6691
16.6068 119.7024
25.1379 74.3382
30.3651 23.2635
31.7231 105.9109
31.8653 36.9388
%for loop going from the top of the vector column to the bottom
for counter = 1:size(point_coordinates,1)
%current point defined selected
current_point = point_coordinates(counter,:);
%math to calculate distance between the current point and all the points
distance_search= point_coordinates-repmat(current_point,[size(point_coordinates,1) 1]);
dist_from_current_point = sqrt(distance_search(:,1).^2+distance_search(:,2).^2);
%line to omit self subtraction that gives zero
dist_from_current_point (dist_from_current_point <= 0)=[];
%gives the shortest distance calculated for a certain vector and current_point
nearest_dist=min(dist_from_current_point);
end
%final line to plot the u,v vectors and the corresponding nearest neighbour
%distances
matnndist = [point_coordinates nearest_dist]
I am not sure how to structure the 'for' loop/nearest_neighbour line to be able to get the nearest neighbour distance for each u,v vector.
I would like to have, for example ;
for the first vector you could have the coordinates and the corresponding shortest distance, for the second vector another its shortest distance, and this goes on till n
Hope someone can help.
Thanks
I understand you want to obtain the minimum distance between different points.
You can compute the distance for each pair of points with bsxfun; remove self-distances; minimize. It's more computationally efficient to work with squared distances, and take the square root only at the end.
n = size(point_coordinates,1);
dist = bsxfun(#minus, point_coordinates(:,1), point_coordinates(:,1).').^2 + ...
bsxfun(#minus, point_coordinates(:,2), point_coordinates(:,2).').^2;
dist(1:n+1:end) = inf; %// remove self-distances
min_dist = sqrt(min(dist(:)));
Alternatively, you could use pdist. This avoids computing each distance twice, and also avoids self-distances:
dist = pdist(point_coordinates);
min_dist = min(dist(:));
If I can suggest a built-in function, use knnsearch from the statistics toolbox. What you are essentially doing is a K-Nearest Neighbour (KNN) algorithm, but you are ignoring self-distances. The way you would call knnsearch is in the following way:
[idx,d] = knnsearch(X, Y, 'k', k);
In simple terms, the KNN algorithm returns the k closest points to your data set given a query point. Usually, the Euclidean distance is the distance metric that is used. For MATLAB's knnsearch, X is a 2D array that consists of your dataset where each row is an observation and each column is a variable. Y would be the query points. Y is also a 2D array where each row is a query point and you need to have the same number of columns as X. We would also specify the flag 'k' to denote how many closest points you want returned. By default, k = 1.
As such, idx would be a N x K matrix, where N is the total number of query points (number of rows of Y) and K would be those k closest points to the dataset for each query point we have. idx indicates the particular points in your dataset that were closest to each query. d is also a N x K matrix that returns the smallest distances for these corresponding closest points.
As such, what you want to do is find the closest point for your dataset to each of the other points, ignoring self-distances. Therefore, you would set both X and Y to be the same, and set k = 2, discarding the first column of both outputs to get the result you're looking for.
Therefore:
[idx,d] = knnsearch(point_coordinates, point_coordinates, 'k', 2)
idx = idx(:,2);
d = d(:,2);
We thus get for idx and d:
>> idx
idx =
3
5
1
1
7
3
5
>> d
d =
17.3562
18.5316
17.3562
31.9027
13.7573
20.4624
13.7573
As such, this tells us that for the first point in your data set, it matched with point #3 the best. This matched with the closest distance of 17.3562. For the second point in your data set, it matched with point #5 the best with the closest distance being 18.5316. You can continue on with the rest of the results in a similar pattern.
If you don't have access to the statistics toolbox, consider reading my StackOverflow post on how I compute KNN from first principles.
Finding K-nearest neighbors and its implementation
In fact, it is very similar to Luis Mendo's post to you earlier.
Good luck!
I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
M=mean(C{ii},2)
wts = [1/24;repmat(1/12,11,1);1/24];
Ms=conv(M,wts,'valid')
plot(M)
plot(Ms,'r')
end
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
filter(ones(1,windowSize)/windowSize,1,data)
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
cconv(x,wts,N);
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
end
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.
Say we have a matrix m x n where the number of rows of the matrix is very big. If we assume each row is a vector, then how could one find the maximum/minimum distance between vectors in this matrix?
My suggestion would be to use pdist. This computes pairs of Euclidean distances between unique combinations of observations like #seb has suggested, but this is already built into MATLAB. Your matrix is already formatted nicely for pdist where each row is an observation and each column is a variable.
Once you do apply pdist, apply squareform so that you can display the distance between pairwise entries in a more pleasant matrix form. The (i,j) entry for each value in this matrix tells you the distance between the ith and jth row. Also note that this matrix will be symmetric and the distances along the diagonal will inevitably equal to 0, as any vector's distance to itself must be zero. If your minimum distance between two different vectors were zero, if we were to search this matrix, then it may possibly report a self-distance instead of the actual distance between two different vectors. As such, in this matrix, you should set the diagonals of this matrix to NaN to avoid outputting these.
As such, assuming your matrix is A, all you have to do is this:
distValues = pdist(A); %// Compute pairwise distances
minDist = min(distValues); %// Find minimum distance
maxDist = max(distValues); %// Find maximum distance
distMatrix = squareform(distValues); %// Prettify
distMatrix(logical(eye(size(distMatrix)))) = NaN; %// Ignore self-distances
[minI,minJ] = find(distMatrix == minDist, 1); %// Find the two vectors with min. distance
[maxI,maxJ] = find(distMatrix == maxDist, 1); %// Find the two vectors with max. distance
minI, minJ, maxI, maxJ will return the two rows of A that produced the smallest distance and the largest distance respectively. Note that with the find statement, I have made the second parameter 1 so that it only returns one pair of vectors that have this minimum / maximum distance between each other. However, if you omit this parameter, then it will return all possible pairs of rows that share this same distance, but you will get duplicate entries as the squareform is symmetric. If you want to escape the duplication, set either the upper triangular half, or lower triangular half of your squareform matrix to NaN to tell MATLAB to skip searching in these duplicated areas. You can use MATLAB's tril or triu commands to do that. Take note that either of these methods by default will include the diagonal of the matrix and so there won't be any extra work here. As such, try something like:
distValues = pdist(A); %// Compute pairwise distances
minDist = min(distValues); %// Find minimum distance
maxDist = max(distValues); %// Find maximum distance
distMatrix = squareform(distValues); %// Prettify
distMatrix(triu(true(size(distMatrix)))) = NaN; %// To avoid searching for duplicates
[minI,minJ] = find(distMatrix == minDist); %// Find pairs of vectors with min. distance
[maxI,maxJ] = find(distMatrix == maxDist); %// Find pairs of vectors with max. distance
Judging from your application, you just want to find one such occurrence only, so let's leave it at that, but I'll put that here for you in case you need it.
You mean the max/min distance between any 2 rows? If so, you can try that:
numRows = 6;
A = randn(numRows, 100); %// Example of input matrix
%// Compute distances between each combination of 2 rows
T = nchoosek(1:numRows,2); %// pairs of indexes for all combinations of 2 rows
for k=1:length(T)
d(k) = norm(A(T(k,1),:)-A(T(k,2),:));
end
%// Find min/max distance
[~, minIndex] = min(d);
[~, maxIndex] = max(d);
T(minIndex,:) %// Displays indexes of the 2 rows with minimum distance
T(maxIndex,:) %// Displays indexes of the 2 rows with maximum distance
I need to generate N random coordinates for a 2D plane. The distance between any two points are given (number of distance is N(N - 1) / 2). For example, say I need to generate 3 points i.e. A, B, C. I have the distance between pair of them i.e. distAB, distAC and distBC.
Is there any built-in function in MATLAB that can do this? Basically, I'm looking for something that is the reverse of pdist() function.
My initial idea was to choose a point (say A is the origin). Then, I can randomly find B and C being on two different circles with radii distAB and distAC. But then the distance between B and C might not satisfy distBC and I'm not sure how to proceed if this happens. And I think this approach will get very complicated if N is a large number.
Elaborating on Ansaris answer I produced the following. It assumes a valid distance matrix provided, calculates positions in 2D based on cmdscale, does a random rotation (random translation could be added also), and visualizes the results:
%Distance matrix
D = [0 2 3; ...
2 0 4; ...
3 4 0];
%Generate point coordinates based on distance matrix
Y = cmdscale(D);
[nPoints dim] = size(Y);
%Add random rotation
randTheta = 2*pi*rand(1);
Rot = [cos(randTheta) -sin(randTheta); sin(randTheta) cos(randTheta) ];
Y = Y*Rot;
%Visualization
figure(1);clf;
plot(Y(:,1),Y(:,2),'.','markersize',20)
hold on;t=0:.01:2*pi;
for r = 1 : nPoints - 1
for c = r+1 : nPoints
plot(Y(r,1)+D(r,c)*sin(t),Y(r,2)+D(r,c)*cos(t));
plot(Y(c,1)+D(r,c)*sin(t),Y(c,2)+D(r,c)*cos(t));
end
end
You want to use a technique called classical multidimensional scaling. It will work fine and losslessly if the distances you have correspond to distances between valid points in 2-D. Luckily there is a function in MATLAB that does exactly this: cmdscale. Once you run this function on your distance matrix, you can treat the first two columns in the first output argument as the points you need.