kmeans on fisherIris data - matlab

I have the following script for kmeans in Matlab:
load fisheriris
k = 3;
clusterIndex = kmeans(meas,3);
scatter(meas(:,1),meas(:,2),[],clusterIndex, 'filled')
How to plot the centroids of each group?
Please help!

Straight from the docs:
[IDX,C] = kmeans(X,k) returns the k cluster centroid locations in the
k-by-p matrix C
So in your case simply do this:
[clusterIndex, centroids] = kmeans(meas,3);
By the way you might like gscatter, it will colour your clusters nicely for you.

Related

Creating Clusters in matlab

Suppose that I have generated some data in matlab as follows:
n = 100;
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];
plot(x,y,'rx')
axis([0 100 0 1])
Now I want to generate an algorithm to classify all these data into some clusters(which are arbitrary) in a way such that a point be a member of a cluster only if the distance between this point and at least one of the members of the cluster be less than 10.How could I generate the code?
The clustering method you are describing is DBSCAN. Note that this algorithm will find only one cluster in provided data, since it's very unlikely that there is a point in the dataset so that its distance to all other points is more than 10.
If this is really what you want, you can use ِDBSCAN, or the one posted in FE, if you are using versions older than 2019a.
% Generating random points, almost similar to the data provided by OP
data = bsxfun(#times, rand(100, 2), [100 1]);
% Adding more random points
for i=1:5
mu = rand(1, 2)*100 -50;
A = rand(2)*5;
sigma = A*A'+eye(2)*(1+rand*2);%[1,1.5;1.5,3];
data = [data;mvnrnd(mu,sigma,20)];
end
% clustering using DBSCAN, with epsilon = 10, and min-points = 1 as
idx = DBSCAN(data, 10, 1);
% plotting clusters
numCluster = max(idx);
colors = lines(numCluster);
scatter(data(:, 1), data(:, 2), 30, colors(idx, :), 'filled')
title(['No. of Clusters: ' num2str(numCluster)])
axis equal
The numbers in above figure shows the distance between closest pairs of points in any two different clusters.
The Matlab built-in function clusterdata() works well for what you're asking.
Here is how to apply it to your example:
% number of points
n = 100;
% create the data
x = randi(n,[n,1]);
y = rand(n,1);
data = [x y];
% the number of clusters you want to create
num_clusters = 5;
T1 = clusterdata(data,'Criterion','distance',...
'Distance','euclidean',...
'MaxClust', num_clusters)
scatter(x, y, 100, T1,'filled')
In this case, I used 5 clusters and used the Euclidean distance to be the metric to group the data points, but you can always change that (see documentation of clusterdata())
See the result below for 5 clusters with some random data.
Note that the data is skewed (x-values are from 0 to 100, and y-values are from 0 to 1), so the results are also skewed, but you could always normalize your data.
Here is a way using the connected components of graph:
D = pdist2(x, y) < 10;
D(1:size(D,1)+1:end) = 0;
G = graph(D);
C = conncomp(G);
The connected components is vector that shows the cluster numbers.
Use pdist2 to compute distance matrix of x and y.
Use the distance matrix to create a logical adjacency matrix that shows two point are neighbors if distance between them is less than 10.
Set the diagonal elements of the adjacency matrix to 0 to eliminate self loops.
Create a graph from the adjacency matrix.
Compute the connected components of graph.
Note that using pdist2 for large datasets may not be applicable and you need to use other methods to form a sparse adjacency matrix.
I notified after posing my answer the answer provided by #saastn suggested to use DBSCAN algorithm that nearly follows the same approach.

clustering on the base of k means centroid

I have images of 1500 patients's lungs, And I am trying to apply kmean on them to solve my issue. My problem is, I want to apply k mean on one patient (has 230 images ) then saving the centroid of this patient, i want to apply kmeans on other patients based on this centroid. This is the matlab code.
[idx,C] = kmeans(data,80)
Now, I have C but what should I do to use it and apply this centroid on the other images as well?
Here' what my data looks like, I am clustering based upon the histograms of these images.
Img1 histogram with 16 bins
Img2 Histo gram with 16 bins
Img3 // // // // // // //
Img4 // / / / // /// // /
.
.
.
Any tutorial or anything that might help, please suggest. Thank you.
in Kmeans the membership of each point determined by the closest center. Therefore, after you have the centers you can keep associating more points by checking their distance from each center. in matlab you can easily do it with pdist2:
dim = 2;
n = 100;
% generate two data sets
data1 = rand(n,dim);
data2 = rand(n,dim);
% computing membership & clusters using kmeans on data1
k = 5;
[idx1,C] = kmeans(data1,k);
% computing membership using pairwise distance on data2
D = pdist2(data2,C);
[~,idx2] = min(D,[],2);
% plot centers
scatter(C(:,1),C(:,2),100,1:k,'*')
hold on
% plot data1
scatter(data1(:,1),data1(:,2),30,idx1,'filled')
% plot data2
scatter(data2(:,1),data2(:,2),30,idx2)
legend('centers','data1','data2')
if you want you can even plot the membership limits using Voronoi diagram:
voronoi(C(:,1),C(:,2));

What data of images are given to kmeans clustering in matlab?

Iam having 100 images in my database.Iam using those 100 images as both training set and also test images.I have to make 5 clusters.Iam using eigen faces(PCA) for feature extraction.What data should be given for kmeans command in matlab?
Syntax for kmeans command:
[IDX,C] = kmeans(X,k)
1.What is the X value?
2.Whether we have to give euclidian distance as input?
3.Whether we have to give weight vector of input images?
Please explain me in detail.
Source code i tried
X = []
srcFiles = dir('C:\Users\rahul\Desktop\tomorow\*.jpg'); % the folder in which ur images exists
for i = 1 : length(srcFiles)
filename = strcat('C:\Users\rahul\Desktop\tomorow\',srcFiles(b).name);
Imgdata = imread(filename);
X(:, i) = princomp(Imgdata);
end
[idx, c] = kmeans(X, 5)
Error iam getting:
Index exceeds matrix dimensions.
Error in pca (line 4)
filename =strcat('C:\Users\rahul\Desktop\tomorow\',srcFiles(b).name);
The PCA function you are using (I don't know what it is exactly), produces a vector of n numbers. This vectors describes the picture, and is what needs to be given to the k-means algorithm.
First of all, run the PCA for all 100 images, producing a nX100 matrix.
X = []
for i = 1 : 100
X(:, i) = PCA(picture...)
end
If pca return a line instead of column, you need
X(:, i) = PCA(picture)'
The k-means functions takes this parameter, as well as the number k of clusters. So
[idx, c] = kmeans(X, 5);
The distance used for clustering is euclidean by default. If you want some different distance metric, you can supply it as a parameter. See the table here for the available distance metrics.
Finally, the standard k-means algorithm is not weighted, so you can't supply weights to the vectors.

creating legend for scatter3 plot (Matlab)

I have a matrix points X in 3 dimensions (X is a Nx3 matrix) and those points belong to clusters. The cluster it belongs is given by the Nx1 vector Cluster (it has values like 1,2,3,...). So, I am plotting it on scatter3 like this:
scatter3(X(:,1),X(:,2),X(:,3),15,Cluster)
It works fine, but I would like to add a legend to it, showing the colored markers and the cluster it represents.
For example, if i have 3 clusters, I would like to have a legend like:
<blue o> - Cluster 1
<red o> - Cluster 2
<yellow o> - Cluster 3
Thank you very much for the help!
Instead of using scatter3, I suggest you use plot3, which will make labeling much simpler:
%# find out how many clusters you have
uClusters = unique(Cluster);
nClusters = length(uClusters);
%# create colormap
%# distinguishable_colormap from the File Exchange
%# is great for distinguishing groups instead of hsv
cmap = hsv(nClusters);
%# plot, set DisplayName so that the legend shows the right label
figure,hold on
for iCluster = 1:nClusters
clustIdx = Cluster==uClusters(iCluster);
plot3(X(clustIdx,1),X(clustIdx,2),X(clustIdx,3),'o','MarkerSize',15,...
'DisplayName',sprintf('Cluster %i',uClusters(iCluster)));
end
legend('show');
Either you use
legend
Code:
h = scatter3(X(:,1),X(:,2),X(:,3),15,Cluster)
hstruct = get(h);
legend(hstruct.Children, "Cluster1", "Cluster2", "Cluter3");
or
annotation.

MATLAB: draw centroids

My main question is given a feature centroid, how can I draw it in MATLAB?
In more detail, I have an NxNx3 image (an RGB image) of which I take 4x4 blocks and compute a 6-dimensional feature vector for each block. I store these feature vectors in an Mx6 matrix on which I run kmeans function and obtain the centroids in a kx6 matrix, where k is the number of clusters and 6 is the number of features for each block.
How can I draw these center clusters in my image in order to visualize if the algorithm is performing the way I wish it to perform? Or if anyone has any other way/suggestions on how I can visualize the centroids on my image, I'd greatly appreciate it.
Here's one way you can visualize the clusters:
As you described, first I extract the blocks, compute the feature vector for each, and cluster this features matrix.
Next we can visualize the clusters assigned to each block. Note that I am assuming that the 4x4 blocks are distinct, this is important so that we can map the blocks to their location back in the original image.
Finally, in order to display the cluster centroids on the image, I simply find the closest block to each cluster and display it as a representative of that cluster.
Here's a complete example to show the above idea (in your case, you would want to replace the function that computes the features of each block by your own implementation; I am simply taking the min/max/mean/median/Q1/Q3 as my feature vector for each 4x4 block):
%# params
NUM_CLUSTERS = 3;
BLOCK_SIZE = 4;
featureFunc = #(X) [min(X); max(X); mean(X); prctile(X, [25 50 75])];
%# read image
I = imread('peppers.png');
I = double( rgb2gray(I) );
%# extract blocks as column
J = im2col(I, [BLOCK_SIZE BLOCK_SIZE], 'distinct'); %# 16-by-NumBlocks
%# compute features for each block
JJ = featureFunc(J)'; %'# NumBlocks-by-6
%# cluster blocks according to the features extracted
[clustIDX, ~, ~, Dist] = kmeans(JJ, NUM_CLUSTERS);
%# display the cluster index assigned for each block as an image
cc = reshape(clustIDX, ceil(size(I)/BLOCK_SIZE));
RGB = label2rgb(cc);
imshow(RGB), hold on
%# find and display the closest block to each cluster
[~,idx] = min(Dist);
[r c] = ind2sub(ceil(size(I)/BLOCK_SIZE), idx);
for i=1:NUM_CLUSTERS
text(c(i)+2, r(i), num2str(i), 'fontsize',20)
end
plot(c, r, 'k.', 'markersize',30)
legend('Centroids')
The centroids do not correspond to coordinates in the image, but to coordinates in the feature space. There is two ways you can test how well kmeans performed. For both ways, you want to fist associate the points with their closest cluster. You get this information from the first output of kmeans.
(1) You can visualize the clustering result by reducing the 6-dimensional space to 2 or 3-dimensional space and then plotting the differently classified coordinates in different colors.
Assuming that the feature vectors are collected in an array called featureArray, and that you asked for nClusters clusters, you'd do the plot as follows using mdscale to transform the data to, say, 3D space:
%# kmeans clustering
[idx,centroids6D] = kmeans(featureArray,nClusters);
%# find the dissimilarity between features in the array for mdscale.
%# Add the cluster centroids to the points, so that they get transformed by mdscale as well.
%# I assume that you use Euclidean distance.
dissimilarities = pdist([featureArray;centroids6D]);
%# transform onto 3D space
transformedCoords = mdscale(dissimilarities,3);
%# create colormap with nClusters colors
cmap = hsv(nClusters);
%# loop to plot
figure
hold on,
for c = 1:nClusters
%# plot the coordinates
currentIdx = find(idx==c);
plot3(transformedCoords(currentIdx,1),transformedCoords(currentIdx,2),...
transformedCoords(currentIdx,3),'.','Color',cmap(c,:));
%# plot the cluster centroid with a black-edged square
plot3(transformedCoords(1:end-nClusters+c,1),transformedCoords(1:end-nClusters+c,2),...
transformedCoords(1:end-nClusters+c,3),'s','MarkerFaceColor',cmap(c,:),...
MarkerEdgeColor','k');
end
(2) You can, alternatively, create a pseudo-colored image that shows you what part of the image belongs to which cluster
Assuming that you have nRows by nCols blocks, you write
%# kmeans clustering
[idx,centroids6D] = kmeans(featureArray,nClusters);
%# create image
img = reshape(idx,nRows,nCols);
%# create colormap
cmap = hsv(nClusters);
%# show the image and color according to clusters
figure
imshow(img,[])
colormap(cmap)