K-means boundaries - cluster-analysis

Is there any way to find boundaries (coordinates) for a x-y data in kmeans clustering. I produced 8 clusters from the xy data which looks like below (each color represent one cluster). I need to get values of the boundaries for each cluster.

The ELKI tool that I usually use for clustering will generate the boundaries for you in the visualization. I don't know if it will also output the coordinates to a file though.
It's called a Voronoi diagram, and you need the dual, the Delaunay Triangulation to build it. You can easily find algorithms for that.
Beware that some edges will go to infinity (just imagine two clusters, how does their boundary look like? What are the coordinates of the boundary?)
Note that on your data set, this clustering does not appear to be very good. The boundaries between clusters look quite arbitrary to me.

Related

clustering based particle reconstruction

I have closely spaced datapoints in 3D space which form V shape with angle between 10 to 20 degree. The datapoints are not well separated. Is there any clustering algorithm that can separate the data into 2 clusters like two parts of V shape?
I have tried distance based algos like hierarchical clustering, K-means and density based dbscan, and pca, gmm too but nothing worked. Basically anything that minimizes distance of any sort do not work.
I am hereby attaching the picture of the data points and what i want the clusters to look like this: Image
Is there any other method that can work?
Thank you,

Providing Centroids and Then Clustering

I seem to find a lot of documentation based on computing centroids and clustering, but what if I assign centroid values themselves.
Say if I provide 14 different centroid vectors. How would I go about clustering my data to those 14 different centroid values?
Maybe this is an easy question, but I haven't found an answer online, so wanted to make sure.
If the centroids are predefined, then you are doing nearest-neighbor classification, not clustering. It's only clustering if the structure is not predefined.
Not sure this belongs in the python forum, but you just need to compute the distance from each of your points to each centroid, and then assign each point to that centroid that is closest. You then have your clusters, though some may be empty (no guarantee that a centroid will have at least one data point closest to it). You can do this by iterating over all of your points, or do it much more quickly in one step using matrices with numpy. I've got some code lying around somewhere if you need an example to get started.

Point cloud, cluster, blob detection

I have a binary image full noises. I detected the objects circled in red using median filter B = medfilt2(A, [m n])(Matlab) or medianBlur(src, dst, ksize)(openCV).
Could you suggest other methods to detect those objects in a more "academic" way, e.g probabilistic method, clustering, etc?
This example looks like the very scenario DBSCAN was designed for.
Lots of noise, but with a well understood density, and clusters with much higher density but arbitrary shape.
You can use any sort of clustering here and can start from k-means one.
You can find a pretty good example from Matlab to start with.

Finding elongated clusters using MATLAB

Let me explain what I'm trying to do.
I have plot of an Image's points/pixels in the RGB space.
What I am trying to do is find elongated clusters in this space. I'm fairly new to clustering techniques and maybe I'm not doing things correctly, I'm trying to cluster using MATLAB's inbuilt k-means clustering but it appears as if that is not the best approach in this case.
What I need to do is find "color clusters".
This is what I get after applying K-means on an image.
This is how it should look like:
for an image like this:
Can someone tell me where I'm going wrong, and what I can to do improve my results?
Note: Sorry for the low-res images, these are the best I have.
Are you trying to replicate the results of this paper? I would say just do what they did.
However, I will add since there are some issues with the current answers.
1) Yes, your clusters are not spherical- which is an assumption k-means makes. DBSCAN and MeanShift are two more common methods for handling such data, as they can handle non spherical data. However, your data appears to have one large central clump that spreads outwards in a few finite directions.
For DBSCAN, this means it will put everything into one cluster, or everything is its own cluster. As DBSCAN has the assumption of uniform density and requires that clusters be separated by some margin.
MeanShift will likely have difficulty because everything seems to be coming from one central lump - so that will be the area of highest density that the points will shift toward, and converge to one large cluster.
My advice would be to change color spaces. RGB has issues, and it the assumptions most algorithms make will probably not hold up well under it. What clustering algorithm you should be using will then likely change in the different feature space, but hopefully it will make the problem easier to handle.
k-means basically assumes clusters are approximately spherical. In your case they are definitely NOT. Try fit a Gaussian to each cluster with non-spherical covariance matrix.
Basically, you will be following the same expectation-maximization (EM) steps as in k-means with the only exception that you will be modeling and fitting the covariance matrix as well.
Here's an outline for the algorithm
init: assign each point at random to one of k clusters.
For each cluster estimate mean and covariance
For each point estimate its likelihood to belong to each cluster
note that this likelihood is based not only on the distance to the center (mean) but also on the shape of the cluster as it is encoded by the covariance matrix
repeat stages 2 and 3 until convergence or until exceeded pre-defined number of iterations
Take a look at density-based clustering algorithms, such as DBSCAN and MeanShift. If you are doing this for segmentation, you might want to add pixel coordinates to your vectors.

Writing ELKI DBSCAN convex hull of clusters to file

I have started using ELKI for data analysis, but one seemingly simple thing I cannot seem to do is output the calculated convex hull of clusters to a file after running DBSCAN. I am able to visualize the convex hulls via the visualization gui, but I cannot generate the KML file. I am also able to write my clustering results to a folder (using the ResultWriter resulthandler), but no file is generated when I set the KMLOutputHandler. I receive no error message in the log window (even with verbose parameter set to true).
Is there a trick to generating a KML file in ELKI? Could anyone walk through the steps of doing this?
Any help would be appreciated.
(as an aside, is it possible to generate alpha shapes for DBSCAN results with ELKI? If so, which parameter must be adjusted?)
So that is actually a lot of questions in one...
Cunvex hulls: they are used in ELKI for visualization, but not considered part of the output result, so they are not saved to file. A trick you could employ is to save the visualization as SVG and extract them from this file, but they will then be in a different coordinate system.
One of the reasons for this is that the convex hulls are only implemented for 2D Euclidean space - I figure you want to use it for spatial data, where it may actually happen to not return the correct convex hull then due to the curvature of the earth surface. Furthermore, many data sets will be of higher dimensionality.
However, you can of course look at the source code and invoke the convex hull algorithm, then write the result to your favorite output format. In general, just as you will need to spend time on preprocessing, you will also need to customize the output.
Which brings me to the second question. The KMLResultHandler is closely tied to the publication of ELKI 0.4.0: Spatial Outlier Detection: Data, Algorithms, Visualizations.
Which pretty much summarizes what this class does: visualize spatial outlier detection. It currently does not (yet) include code to visualize clusters of spatial data, for example. In order to get an output from the class, you need to ensure a number of restrictions, unfortunately. Essentially, if it finds a Polygon relation and a OutlierResult that it can map to each other, it will output this to KML.
It is not yet a class that could write arbitrary results to KML. It probably needs a lot more of documentation, too. Contributions of a more general output tool would be appreciated; but a customizeable, automatic, general output to KML is really hard to do. In particular, you may also end up having to include projection capabilities then, if someone is not processing Latitude-Longitude data, but e.g. UTM projected data.
As such, I recommend looking at the source code of the class and customizing it to your needs. In my opinion, visualization to KML will always require a lot of customization.
To generate alpha shapes (only the hull, not the extended alpha shape - the optimal visualization of DBSCAN would likely consist of the alpha shape of the core points only, extended by a radius of epsilon, which should then include the border points. This is on the wish list, but not implemented), you just need to set the -hull.alpha parameter to the desired alpha value. Note that this happens in the visualization projection, not at the raw data. If the axes are scaled differently, alpha shapes will look differently. Again, you may be interested in using the class AlphaShape on the raw data vectors, instead of exporting the projected visualization. Then you can easily write the resulting Polygons to your custom visualization.
If you implement such a KML visualization using alpha shapes (or convex hulls) for clusters, I would appreciate if you could contribute this to ELKI to make it available for others as well. Thank you.