Nearest point between two clusters Matlab - matlab

I have a set of clusters consisting of 3D points. I want to get the nearest two points from each two clusters.
For example: I have 5 clusters C1 to C5 consisting of a 3D points. For C1 and C2 there are two points Pc1 "point in C1" and Pc2 "point in C2" that are the closet two points between the two clusters C1 and C2, same between C1 and C3..C5 and same between C2 and C3..C5 and so on. After that I'll have 20 points representing the nearest points between the different clusters.
The second thing is that I want to connect this points together if the distance between each of them and the other is less than a certain distance "threshold".
So I'm asking if anyone could please advise me
Update:
Thanks Amro for your answer, I've updated it to CIDX=kmeans(X, K,'distance','cityblock', 'replicates',5); to solve the empty cluster error. But another error appeared "pdistmex Out of memory. Type HELP MEMORY for your options." So I've checked your answer here: Out of memory error while using clusterdata in MATLAB and updated your code as below but the problem now is that there is now an indexing error in this code mn = min(min(D(idx1,idx2))); I'm asking if there is a workaround for this error?
Code used:
%function single_linkage(depth,clrr)
X = randn(5000,3);
%X=XX;
% clr = clrr;
K=7;
clr = jet(K);
%// cluster into K=4
K = 7;
%CIDX = kmeans(X,K);
%// pairwise distances
SUBSET_SIZE = 1000; %# subset size
ind = randperm(size(X,1));
data = X(ind(1:SUBSET_SIZE), :);
D = squareform(pdist(data));
subs = 1:size(D,1);
CIDX=kmeans(D, K,'distance','sqEuclidean', 'replicates',5);
centers = zeros(K, size(data,2));
for i=1:size(data,2)
centers(:,i) = accumarray(CIDX, data(:,i), [], #mean);
end
%# calculate distance of each instance to all cluster centers
D = zeros(size(X,1), K);
for k=1:K
D(:,k) = sum( bsxfun(#minus, X, centers(k,:)).^2, 2);
end
%D=squareform(D);
%# assign each instance to the closest cluster
[~,clustIDX] = min(D, [], 2);
%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs));
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
%// index of points assigned to each of the two cluster
idx1 = (clustIDX == cpairs(i,1));
idx2 = (clustIDX == cpairs(i,2));
%// shortest distance between the two clusters
mn = min(min(D(idx1,idx2)));
dists(i) = mn;
%// corresponding pair of points with the minimum distance
[r,c] = find(D(idx1,idx2)==mn);
s1 = subs(idx1); s2 = subs(idx2);
pairs(i,:) = [s1(r) s2(c)];
end
%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf;
cpairs(dist>thresh,:) = [];
%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
%clr = lines(K);
h = zeros(1,K);
for i=1:K
h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
'Color',clr(i,:), 'LineStyle','none', 'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d')) %'
view(3), axis vis3d, grid on
%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
'Color','k', 'LineStyle','-', 'LineWidth',3, ...
'Marker','o', 'MarkerSize',10);
end

What you are asking for sounds similar to what single-linkage clustering does at each step; from the bottoms-up, clusters separated by the shortest distance are combined.
Anyway below is the brute-force way of solving this. I'm sure there are more efficient implementations, but this one is easy to implement.
%// data of 3D points
X = randn(5000,3);
%// cluster into K=4
K = 4;
CIDX = kmeans(X,K);
%// pairwise distances
D = squareform(pdist(X));
subs = 1:size(X,1);
%// for each pair of clusters
cpairs = nchoosek(1:K,2);
pairs = zeros(size(cpairs));
dists = zeros(size(cpairs,1),1);
for i=1:size(cpairs,1)
%// index of points assigned to each of the two cluster
idx1 = (CIDX == cpairs(i,1));
idx2 = (CIDX == cpairs(i,2));
%// shortest distance between the two clusters
mn = min(min(D(idx1,idx2)));
dists(i) = mn;
%// corresponding pair of points with the minimum distance
[r,c] = find(D(idx1,idx2)==mn);
s1 = subs(idx1); s2 = subs(idx2);
pairs(i,:) = [s1(r) s2(c)];
end
%// filter pairs by keeping only those whose distances is below a threshold
thresh = inf; %// use your threshold value instead
cpairs(dists>thresh,:) = [];
%// plot 3D points color-coded by clusters
figure('renderer','zbuffer')
clr = lines(K);
h = zeros(1,K);
for i=1:K
h(i) = line(X(CIDX==i,1), X(CIDX==i,2), X(CIDX==i,3), ...
'Color',clr(i,:), 'LineStyle','none', ...
'Marker','.', 'MarkerSize',5);
end
legend(h, num2str((1:K)', 'C%d')) %'
view(3), axis vis3d, grid on
%// mark and connect nearest points between each pair of clusters
for i=1:size(pairs,1)
line(X(pairs(i,:),1), X(pairs(i,:),2), X(pairs(i,:),3), ...
'Color','k', 'LineStyle','-', 'LineWidth',3, ...
'Marker','o', 'MarkerSize',10);
end
Note that in the above example the data is randomly generated and not very interesting, so it is hard to see the connected nearest points.
Just for fun, here is another result where I simply replaced the min-distance by the max-distance between pair of clusters (similar to complete-linkage clustering), i.e use:
mx = max(max(D(idx1,idx2)));
instead of the previous:
mn = min(min(D(idx1,idx2)));
which shows how we connect the farthest points between each pair of clusters. This visualization is a bit more interesting in my opinion :)

Related

How to calculate distance from cluster centroids from the outputs of Matlab kmean() function

I have 2 output clusters from k means matlab function
[idx,C] = kmeans(X,2);
I don't know how to calculate the distance between centroid and each point in cluster by using "idx"
I want to get matrix with all points that their distance to centroid >2
% not Matlab code; just illustrating concept
example
c1->{x1,x2}=
x1-c1=3
x2-c1=2
c2->{y1,y2}=
y1-c2=4
y2-c2=1
output={y1,x1}
Try it this way:
Update The answer now uses loops.
r = randn(300,2)*5;
r(151:end,:) = r(151:end,:) + 15;
n_clusters = 2;
[idx, C] = kmeans(r, n_clusters);
clusters = cell(n_clusters, 1);
distances = cell(n_clusters, 1);
for ii = 1:n_clusters
clusters{ii} = r(idx==ii, :);
distances{ii} = sqrt(sum((clusters{ii}-C(ii,:)).^2,2));
end
figure;
subplot(1,2,1);
for ii = 1:n_clusters
plot(clusters{ii}(:,1), clusters{ii}(:,2), '.');
hold on
plot(C(ii,1), C(ii,2), 'ko','MarkerFaceColor', 'w');
end
title('Clusters and centroids');
subplot(1,2,2);
for ii = 1:n_clusters
plot(clusters{ii}(distances{ii} > 2,1), clusters{ii}(distances{ii} > 2,2), '.');
hold on
plot(C(ii,1), C(ii,2), 'ko','MarkerFaceColor', 'w');
end
title('Centroids and points with distance > 2');
To get a a cell with matrices with the points larger than 2, you can do:
distant_points = cell(n_clusters,1);
for ii = 1:n_clusters
distant_points{ii} = clusters{ii}(distances{ii} > 2,:)
end

MATLAB's treeplot: align by node height from top

Here's what I get by using the treeplot function on MATLAB (this is the example image):
Here's what I'd like to get:
As you can see, I'd like to have the position of each node according to its distance from the root. Is that possible?
I was also looking for a "root-aligned" treeplot in Matlab and found no solution. Here is what I came up with, in case someone is still in need of it (I'm using the same example as in the Matlab documentation):
nodes = [0 1 2 2 4 4 4 1 8 8 10 10];
At first we need to get the x and y coordinates of every node in the original tree plot and find all leaves in it:
[x,y] = treelayout(nodes);
leaves = find( y == min(y) );
Next, we reconstruct every chain in the tree plot and store it in a matrix (by doing so, we can later change the y position of the nodes):
num_layers = 1/min(y)-1;
chains = zeros(num_layers, length(leaves));
for l=1:length(leaves)
index = leaves(l);
chain = [];
chain(1) = index;
parent_index = nodes(index);
j = 2;
while (parent_index ~= 0)
chain(j) = parent_index;
parent_index = nodes(parent_index);
j = j+1;
end
chains(:,l) = padarray(flip(chain), [0, num_layers-length(chain)], 'post');
end
Now we compute the new y-coordinates determined by the row index in the matrix and dependent on the number of layers in the tree:
y_new = zeros(size(y));
for i=1:length(nodes)
[r,c] = find(chains==i, 1);
y_new(i) = max(y) - (r-1)*1/(num_layers+1);
end
We can now plot the re-positioned nodes and add the connecting lines:
plot(x, y_new, 'o');
hold on
for c=1:size(chains, 2)
line_x = x(chains(chains(:,c)>0, c));
line_y = y_new(chains(chains(:,c)>0, c));
line(line_x, line_y);
end
If you like, you can also add the node labels to the plot:
for t=1:length(nodes)
text(x(t)+0.025, y_new(t), num2str(t));
end
xlim([0 1]);
ylim([0 1]);
The resulting figure looks as follows:

Sorting two column vectors into 3D matrix based on position

Using the imfindcircles function in MATLAB to track circles in two images. I start with approximately a grid of circles which deforms. I am trying to sort the two column vector from imfindcircles into matrices so that neighbouring circles are neighbouring elements in the matrices. The first image the circles conform to a grid and the following code works:
[centXsort,IX] = sortrows(centres1,1); %sort by x
centYsort =zeros(289,2); %preallocate
for i = 1:17:289
[sortedY,IY] = sortrows(centXsort(i:i+16,:),2); %sort by y within individual column
centYsort(i:i+16,:) = sortedY;
end
cent1mat = reshape(centYsort,17,17,2); %reshape into centre matrices
This doesn't work for the second image as some of the circles overlap in the x or y direction, but neighbouring circles never overlap. This means that in the second set of matrices the neighbouring circles aren't neighbouring elements after sorting.
Is there a way to approximate a scatter of points into a matrix?
This answer doesn't work in every single case, but it seems good enough for situations where the points don't vary too wildly.
My idea is to start at the grid corners and work our way along the outside diagonals of the matrix, trying to "grab" the nearest points that seem like they fit into the grid-points based any surrounding points we've already captured.
You will need to provide:
The number of rows (rows) and columns (cols) in the grid.
Your data points P arranged in a N x 2 array, rescaled to the unit square on [0,1] x [0,1]. (I assume the you can do this through visual inspection of the corner points of your original data.)
A weight parameter edge_weight to tell the algorithm how much the border points should be attracted to the grid border. Some tests show that 3-5 or so are good values.
The code, with a test case included:
%// input parameters
rows = 11;
cols = 11;
edge_weight = 4;
%// function for getting squared errors between the points list P and a specific point pt
getErr =#(P,pt) sqrt( sum( bsxfun(#minus,P,pt(:)').^2, 2 ) ); %'
output_grid = zeros(rows,cols,2); %// output grid matrix
check_grid = zeros(rows,cols); %// matrix flagging the gridpoints we have covered
[ROW,COL] = meshgrid(... %// coordinate points of an "ideal grid"
linspace(0,1,rows),...
linspace(0,1,cols));
%// create a test case
G = [ROW(:),COL(:)]; %// the actual grid-points
noise_factor = 0.35; %// noise radius allowed
rn = noise_factor/rows;
cn = noise_factor/cols;
row_noise = -rn + 2*rn*rand(numel(ROW),1);
col_noise = -cn + 2*cn*rand(numel(ROW),1);
P = G + [row_noise,col_noise]; %// add noise to get points
%// MAIN LOOP
d = 0;
while ~isempty(P) %// while points remain...
d = d+1; %// increase diagonal depth (d=1 are the outer corners)
for ii = max(d-rows+1,1):min(d,rows)%// for every row number i...
i = ii;
j = d-i+1; %// on the dth diagonal, we have d=i+j-1
for c = 1:4 %// repeat for all 4 corners
if i<rows & j<cols & ~check_grid(i,j) %// check for out-of-bounds/repetitions
check_grid(i,j) = true; %// flag gridpoint
current_gridpoint = [ROW(i,j),COL(i,j)];
%// get error between all remaining points and the next gridpoint's neighbours
if i>1
errI = getErr(P,output_grid(i-1,j,:));
else
errI = edge_weight*getErr(P,current_gridpoint);
end
if check_grid(i+1,j)
errI = errI + edge_weight*getErr(P,current_gridpoint);
end
if j>1
errJ = getErr(P,output_grid(i,j-1,:));
else
errJ = edge_weight*getErr(P,current_gridpoint);
end
if check_grid(i,j+1)
errJ = errJ + edge_weight*getErr(P,current_gridpoint);
end
err = errI.^2 + errJ.^2;
%// find the point with minimal error, add it to the grid,
%// and delete it from the points list
[~,idx] = min(err);
output_grid(i,j,:) = permute( P(idx,:), [1 3 2] );
P(idx,:) = [];
end
%// rotate the grid 90 degrees and repeat for next corner
output_grid = cat(3, rot90(output_grid(:,:,1)), rot90(output_grid(:,:,2)));
check_grid = rot90(check_grid);
ROW = rot90(ROW);
COL = rot90(COL);
end
end
end
Code for plotting the resulting points with edges:
%// plotting code
figure(1); clf; hold on;
axis([-0.1 1.1 -0.1 1.1])
for i = 1:size(output_grid,1)
for j = 1:size(output_grid,2)
scatter(output_grid(i,j,1),output_grid(i,j,2),'b')
if i < size(output_grid,1)
plot( [output_grid(i,j,1),output_grid(i+1,j,1)],...
[output_grid(i,j,2),output_grid(i+1,j,2)],...
'r');
end
if j < size(output_grid,2)
plot( [output_grid(i,j,1),output_grid(i,j+1,1)],...
[output_grid(i,j,2),output_grid(i,j+1,2)],...
'r');
end
end
end
I've developed a solution, which works for my case but might not be as robust as required for some. It requires a known number of dots in a 'square' grid and a rough idea of the spacing between the dots. I find the 'AlphaShape' of the dots and all the points that lie along the edge. The edge vector is shifted to start at the minimum and then wrapped around a matrix with the corresponding points are discarded from the list of vertices. Probably not the best idea for large point clouds but good enough for me.
R = 50; % search radius
xy = centres2;
x = centres2(:,1);
y = centres2(:,2);
for i = 1:8
T = delaunay(xy); % delaunay
[~,r] = circumcenter(triangulation(T,x,y)); % circumcenters
T = T(r < R,:); % points within radius
B = freeBoundary(triangulation(T,x,y)); % find edge vertices
A = B(:,1);
EdgeList = [x(A) y(A) x(A)+y(A)]; % find point closest to origin and rotate vector
[~,I] = min(EdgeList);
EdgeList = circshift(EdgeList,-I(3)+1);
n = sqrt(length(xy)); % define zeros matrix
matX = zeros(n); % wrap x vector around zeros matrix
matX(1,1:n) = EdgeList(1:n,1);
matX(2:n-1,n) = EdgeList(n+1:(2*n)-2,1);
matX(n,n:-1:1) = EdgeList((2*n)-1:(3*n)-2,1);
matX(n-1:-1:2,1) = EdgeList((3*n)-1:(4*n)-4,1);
matY = zeros(n); % wrap y vector around zeros matrix
matY(1,1:n) = EdgeList(1:n,2);
matY(2:n-1,n) = EdgeList(n+1:(2*n)-2,2);
matY(n,n:-1:1) = EdgeList((2*n)-1:(3*n)-2,2);
matY(n-1:-1:2,1) = EdgeList((3*n)-1:(4*n)-4,2);
centreMatX(i:n+i-1,i:n+i-1) = matX; % paste into main matrix
centreMatY(i:n+i-1,i:n+i-1) = matY;
xy(B(:,1),:) = 0; % discard values
xy = xy(all(xy,2),:);
x = xy(:,1);
y = xy(:,2);
end
centreMatX(centreMatX == 0) = x;
centreMatY(centreMatY == 0) = y;

Equally spaced points in a contour

I have a set of 2D points (not ordered) forming a closed contour, and I would like to resample them to 14 equally spaced points. It is a contour of a kidney on an image. Any ideas?
One intuitive approach (IMO) is to create an independent variable for both x and y. Base it on arc length, and interpolate on it.
% close the contour, temporarily
xc = [x(:); x(1)];
yc = [y(:); y(1)];
% current spacing may not be equally spaced
dx = diff(xc);
dy = diff(yc);
% distances between consecutive coordiates
dS = sqrt(dx.^2+dy.^2);
dS = [0; dS]; % including start point
% arc length, going along (around) snake
d = cumsum(dS); % here is your independent variable
perim = d(end);
Now you have an independent variable and you can interpolate to create N segments:
N = 14;
ds = perim / N;
dSi = ds*(0:N).'; %' your NEW independent variable, equally spaced
dSi(end) = dSi(end)-.005; % appease interp1
xi = interp1(d,xc,dSi);
yi = interp1(d,yc,dSi);
xi(end)=[]; yi(end)=[];
Try it using imfreehand:
figure, imshow('cameraman.tif');
h = imfreehand(gca);
xy = h.getPosition; x = xy(:,1); y = xy(:,2);
% run the above solution ...
Say your contour is defined by independent vector x and dependent vector y.
You can get your resampled x vector using linspace:
new_x = linspace(min(x),max(x),14); %14 to get 14 equally spaced points
Then use interp1 to get new_y values at each new_x point:
new_y = interp1(x,y,new_x);
There are a few interpolation methods to choose from - default is linear. See interp1 help for more info.

Convex hull / concave hull for multiple clusters in data

I have done a lot of reading on drawing polygons around clusters and realized convhull maybe the best way forward. Basically I am looking for a elastic like polygon to wrap around my cluster points.
My data is matrix consisting of x (1st column) and y(2nd column) points which are grouped in clusters (3rd column). I have 700 such clusters hence not feasible to plot each separately.
Is there a way to perform convhull for each cluster separately and then plot each of them on a single chart.
EDIT
Code I have written until now which isn't able to run convex hull on each individual cluster...
[ndata, text, alldata] = xlsread(fullfile(source_dir));
[~, y] = sort(ndata(:,end));
As = ndata(y,:);
lon = As(:,1);
lat = As(:,2);
cluster = As(:,3);
%% To find number of points in a cluster (repetitions)
rep = zeros(size(cluster));
for j = 1:length(cluster)
rep(j) = sum(cluster==cluster(j));
end
%% Less than 3 points in a cluster are filtered out
x = lon (rep>3);
y = lat (rep>3);
z = cluster (rep>3);
%% convex hull for each cluster plotted ....hold....then display all.
figure
hold on
clusters = unique(z);
for i = 1:length(z)
k=convhull(x(z==clusters(i)), y(z==clusters(i)));
plot(x, y, 'b.'); %# plot cluster points
plot(x(k),y(k),'r-'); %# plots only k indices, giving the convex hull
end
Below is an image of what is being displayed;
If this question has already been asked I apologize for repetition but please do direct me to the answer you'll see fit.
Please can anyone help with this, however trivial I'm really struggling!
I would iterate through all the clusters and do what you already written, and use the hold on option to accumulate all the plots in the same plot. Something like this:
% Generate three clouds of points in 2D:
c1 = bsxfun(#plus, 0.5 * randn(50,2), [1 3]);
c2 = bsxfun(#plus, 0.6 * randn(20,2), [0 0]);
c3 = bsxfun(#plus, 0.4 * randn(20,2), [1 1]);
data = [c1, ones(50,1); ...
c2, 2*ones(20,1); ...
c3, 3*ones(20,1)];
% Plot the data points with different colors
clf
plot(c1(:,1), c1(:,2),'r+', 'LineWidth', 2);
hold on
plot(c2(:,1), c2(:,2),'k+', 'LineWidth', 2);
plot(c3(:,1), c3(:,2),'b+', 'LineWidth', 2);
x = data(:,1);
y = data(:,2);
cluster = data(:,3);
clusters = unique(cluster);
for i = 1:length(clusters)
px = x(cluster == clusters(i));
py = y(cluster == clusters(i));
if length(px) > 2
k = convhull(px, py);
plot(px(k), py(k), '-');
end
end
It gives the following result: