I did a k-means cluster analysis using the kml and kml3d packages. Now I want to extract the cluster centroids to predict the cluster memberships on another dataset. Does anyone know how to do that?
Thanks in advance!
Related
I am trying to extract a fixed (and known) number of clusters from a set of points using Matlab.
An immediate clustering method that I tried is the k-means algorithm which seems to tick all the boxes.
Unfortunately, in some cases, the subsets (or clusters) extracted are intertwined, as shown in the image below for the left-most cluster:
[]
Is there a way to set the k-means algorithm, so that the generated clusters are disconnected?
Is there a way to post-process the cluster indices returned by the k-means algorithm, so as to obtain "disconnected" clusters?
Alternatively, is there another clustering method that might be more suitable?
Thanks!
With sklearn.cluster.AgglomerativeClustering from sklearn I need to specify the number of resulting clusters in advance. What I would like to do instead is to merge clusters until a certain maximum distance between clusters is reached and then stop the clustering process.
Accordingly, the number of clusters might vary depending on the structure of the data. I also do not care about the number of resulting clusters nor the size of the clusters but only that the cluster centroids do not exceed a certain distance.
How can I achieve this?
This pull request for a distance_threshold parameter in scikit-learn's agglomerative clustering may be of interest:
https://github.com/scikit-learn/scikit-learn/pull/9069
It looks like it'll be merged in version 0.22.
EDIT: See my answer to my own question for an example of implementing single linkage clustering with a distance based stopping criterion using scipy.
Use scipy directly instead of sklearn. IMHO, it is much better.
Hierarchical clustering is a three step process:
Compute the dendrogram
Visualize and analyze
Extract branches
But that doesn't fit the supervised-learning-oriented API preference of sklearn, which would like everything to implement a fit, predict API...
SciPy has a function for you:
https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.fcluster.html#scipy.cluster.hierarchy.fcluster
I want to cluster data without k-means. for example I prefer to cluster with DBSCAN or support vector clustering.
So I need evaluating performance of clustering with Davies Bouldin metric but I don't know how to calculate Davies Bouldin in Rapidminer for DBSCAN or Support vector clustering.
Please help me.
Thank you.
The operator Cluster Distance Performance allows the Davies-Bouldin validity measure to be calculated. This requires a cluster model containing the cluster centroids to be passed to it which means approaches like Dbscan and Support vector clustering cannot be used with it because they do not produce cluster centroids.
I have to cluster data which are power profiles of the solar panel output. I tried various algorithm including classical K-means to shape based clustering as well. I have to decide number of cluster possible in the pool of data. And I am always getting 2 cluster, so I think they are very dense.
Is there any way I can partition dense cluster?
I have a binary matrix of size 20 by 300. I want to cluster the 20 variables into five or six groups. So far I used kmeans and hierarchical clustering algorithms in matlab with different distance metrics but both give me non-overlapping clusters. I see on my data that some of the variables should be located in more than one group. Does anyone know if there is a way to do overlapping clusters either in matlab ot R? Any help is greatly appreciated.
Thanks in advance!
Have a look at Fuzzy clustering in MATLAB documentation http://www.mathworks.com/help/toolbox/fuzzy/fp310.html
look for Weka4OC (java) or ADPROCLUS(R) which are able to build overlapping clusters