Point cloud, cluster, blob detection - matlab

I have a binary image full noises. I detected the objects circled in red using median filter B = medfilt2(A, [m n])(Matlab) or medianBlur(src, dst, ksize)(openCV).
Could you suggest other methods to detect those objects in a more "academic" way, e.g probabilistic method, clustering, etc?

This example looks like the very scenario DBSCAN was designed for.
Lots of noise, but with a well understood density, and clusters with much higher density but arbitrary shape.

You can use any sort of clustering here and can start from k-means one.
You can find a pretty good example from Matlab to start with.

Related

K-means boundaries

Is there any way to find boundaries (coordinates) for a x-y data in kmeans clustering. I produced 8 clusters from the xy data which looks like below (each color represent one cluster). I need to get values of the boundaries for each cluster.
The ELKI tool that I usually use for clustering will generate the boundaries for you in the visualization. I don't know if it will also output the coordinates to a file though.
It's called a Voronoi diagram, and you need the dual, the Delaunay Triangulation to build it. You can easily find algorithms for that.
Beware that some edges will go to infinity (just imagine two clusters, how does their boundary look like? What are the coordinates of the boundary?)
Note that on your data set, this clustering does not appear to be very good. The boundaries between clusters look quite arbitrary to me.

Density Based Clustering with Representatives

I'm looking for a method to perform density based clustering. The resulting clusters should have a representative unlike DBSCAN.
Mean-Shift seems to fit those needs but doesn't scale enough for my needs. I have looked into some subspace clustering algorithms and only found CLIQUE using representatives, but this part is not implemented in Elki.
As I noted in the comments on the previous iteration of your question,
https://stackoverflow.com/questions/34720959/dbscan-java-library-with-corepoints
Density-based clustering does not assume there is a center or representative.
Consider the following example image from Wikipedia user Chire (BY-CC-SA 3.0):
Which object should be the representative of the red cluster?
Density-based clustering is about finding "arbitrarily shaped" clusters. These do not have a meaningful single representative object. They are not meant to "compress" your data - this is not a vector quantization method, but structure discovery. But it is the nature of such complex structure that it cannot be reduced to a single representative. The proper representation of such a cluster is the set of all points in the cluster. For geometric understanding in 2D, you can also compute convex hulls, for example, to get an area as in that picture.
Choosing representative objects is a different task. This is not needed for discovering this kind of structure, and thus these algorithms do not compute representative objects - it would waste CPU.
You could choose the object with the highest density as representative of the cluster.
It is a fairly easy modification to DBSCAN to store the neighbor count of every object.
But as Anony-Mousse mentioned, the object may nevertheless be a rather bad choice. Density-based clustering is not designed to yield representative objects.
You could try AffinityPropagation, but it will also not scale very well.

whether i have to choose classification or clustering for my project?

I just detected faces using Viola-jones algorithm. I cropped faces from frames(or video)and I made it as training set.In my video there are 5 different faces. I decided to use eigenfaces for face recognition.I ended with finding eucledian distance for a input image.What Iam supposed do now.Whether I have to use classification or clustering technique?In my project they told to use kmeans clustering.How it can be done using eucledian distance explain?Give some useful links so that I can understand in better way.
When you have 5 different faces, this sounds like classification to me. That is a label column!
I don't think k-means will get you anywhere. When you have high-dimensional data (such as images), k-means performs as good as random convex partitioning. I.e. it's largely useless.

Finding elongated clusters using MATLAB

Let me explain what I'm trying to do.
I have plot of an Image's points/pixels in the RGB space.
What I am trying to do is find elongated clusters in this space. I'm fairly new to clustering techniques and maybe I'm not doing things correctly, I'm trying to cluster using MATLAB's inbuilt k-means clustering but it appears as if that is not the best approach in this case.
What I need to do is find "color clusters".
This is what I get after applying K-means on an image.
This is how it should look like:
for an image like this:
Can someone tell me where I'm going wrong, and what I can to do improve my results?
Note: Sorry for the low-res images, these are the best I have.
Are you trying to replicate the results of this paper? I would say just do what they did.
However, I will add since there are some issues with the current answers.
1) Yes, your clusters are not spherical- which is an assumption k-means makes. DBSCAN and MeanShift are two more common methods for handling such data, as they can handle non spherical data. However, your data appears to have one large central clump that spreads outwards in a few finite directions.
For DBSCAN, this means it will put everything into one cluster, or everything is its own cluster. As DBSCAN has the assumption of uniform density and requires that clusters be separated by some margin.
MeanShift will likely have difficulty because everything seems to be coming from one central lump - so that will be the area of highest density that the points will shift toward, and converge to one large cluster.
My advice would be to change color spaces. RGB has issues, and it the assumptions most algorithms make will probably not hold up well under it. What clustering algorithm you should be using will then likely change in the different feature space, but hopefully it will make the problem easier to handle.
k-means basically assumes clusters are approximately spherical. In your case they are definitely NOT. Try fit a Gaussian to each cluster with non-spherical covariance matrix.
Basically, you will be following the same expectation-maximization (EM) steps as in k-means with the only exception that you will be modeling and fitting the covariance matrix as well.
Here's an outline for the algorithm
init: assign each point at random to one of k clusters.
For each cluster estimate mean and covariance
For each point estimate its likelihood to belong to each cluster
note that this likelihood is based not only on the distance to the center (mean) but also on the shape of the cluster as it is encoded by the covariance matrix
repeat stages 2 and 3 until convergence or until exceeded pre-defined number of iterations
Take a look at density-based clustering algorithms, such as DBSCAN and MeanShift. If you are doing this for segmentation, you might want to add pixel coordinates to your vectors.

Python Clustering Algorithms

I've been looking around scipy and sklearn for clustering algorithms for a particular problem I have. I need some way of characterizing a population of N particles into k groups, where k is not necessarily know, and in addition to this, no a priori linking lengths are known (similar to this question).
I've tried kmeans, which works well if you know how many clusters you want. I've tried dbscan, which does poorly unless you tell it a characteristic length scale on which to stop looking (or start looking) for clusters. The problem is, I have potentially thousands of these clusters of particles, and I cannot spend the time to tell kmeans/dbscan algorithms what they should go off of.
Here is an example of what dbscan find:
You can see that there really are two separate populations here, though adjusting the epsilon factor (the max. distance between neighboring clusters parameter), I simply cannot get it to see those two populations of particles.
Is there any other algorithms which would work here? I'm looking for minimal information upfront - in other words, I'd like the algorithm to be able to make "smart" decisions about what could constitute a separate cluster.
I've found one that requires NO a priori information/guesses and does very well for what I'm asking it to do. It's called Mean Shift and is located in SciKit-Learn. It's also relatively quick (compared to other algorithms like Affinity Propagation).
Here's an example of what it gives:
I also want to point out that in the documentation is states that it may not scale well.
When using DBSCAN it can be helpful to scale/normalize data or
distances beforehand, so that estimation of epsilon will be relative.
There is a implementation of DBSCAN - I think its the one
Anony-Mousse somewhere denoted as 'floating around' - , which comes
with a epsilon estimator function. It works, as long as its not fed
with large datasets.
There are several incomplete versions of OPTICS at github. Maybe
you can find one to adapt it for your purpose. Still
trying to figure out myself, which effect minPts has, using one and
the same extraction method.
You can try a minimum spanning tree (zahn algorithm) and then remove the longest edge similar to alpha shapes. I used it with a delaunay triangulation and a concave hull:http://www.phpdevpad.de/geofence. You can also try a hierarchical cluster for example clusterfck.
Your plot indicates that you chose the minPts parameter way too small.
Have a look at OPTICS, which does no longer need the epsilon parameter of DBSCAN.