clustering based particle reconstruction - unsupervised-learning

I have closely spaced datapoints in 3D space which form V shape with angle between 10 to 20 degree. The datapoints are not well separated. Is there any clustering algorithm that can separate the data into 2 clusters like two parts of V shape?
I have tried distance based algos like hierarchical clustering, K-means and density based dbscan, and pca, gmm too but nothing worked. Basically anything that minimizes distance of any sort do not work.
I am hereby attaching the picture of the data points and what i want the clusters to look like this: Image
Is there any other method that can work?
Thank you,

Related

K-means boundaries

Is there any way to find boundaries (coordinates) for a x-y data in kmeans clustering. I produced 8 clusters from the xy data which looks like below (each color represent one cluster). I need to get values of the boundaries for each cluster.
The ELKI tool that I usually use for clustering will generate the boundaries for you in the visualization. I don't know if it will also output the coordinates to a file though.
It's called a Voronoi diagram, and you need the dual, the Delaunay Triangulation to build it. You can easily find algorithms for that.
Beware that some edges will go to infinity (just imagine two clusters, how does their boundary look like? What are the coordinates of the boundary?)
Note that on your data set, this clustering does not appear to be very good. The boundaries between clusters look quite arbitrary to me.

Clustering of 3D points

I have a large dataset of around 20 million points (x,y,z) in a 3-dimensional space. I know these points are organized in dense regions, but that these regions vary in size. I think a standard unsupervised 3D clustering should solve my problem.
Since I can't estimate the number of clusters a priori, I tried using k-means with a wide range for k, but it is slow and also, I would have to estimate how significant each k-partition is.
Basically, my question is: how can I extract the most significant partition of my points into clusters?
k-means is probably not the best alhorithm for such data.
DBSCAN should be closer to your intuition of dense regions.
Try on a sample first, then figure out how to scale up.
It is not clear to me from the above if you're going to use k-means or not, but if you are, you should be following the responses from the post below which shows how to measure variance of the clusters.
Calculating the percentage of variance measure for k-means?
Additionally, you can get a good fit using 'the elbow method' by trying 2 to 15 k sized clusters. See the answer from Amro for the process on this.
One simple idea in this case is to use 3 different clusterings, along each dimension. That might speed things up.
So you find clusters along X axis (project all the points down to X axis) and then continue to form sub clusters along the Y axis and then along the Z axis.
I think 1-D k-means can be solved very efficiently using dynamic programming http://www.sciencedirect.com/science/article/pii/0025556473900072.

On how to apply k means clustering and outlining the clusters

I am reading about applications of clustering in human motion analysis. I started out with random numbers and applied k-means clustering algorithm but I wanted to have some graphs that circle the clusters as shown in the picture. Basically, the lines represent the motion trajectory. I will appreciate ideas on how to obtain motion trajectory of a person. Application is patient monitoring where the trajectory will be used in abnormal behavior activity.
I will be using a kinect and recording the motion trajectory based on skeleton tracking. So, I will be recording the 4 quaternion values of Head, Shoulder and Torso joints and the RGBD (Red green blue Depth) that is combined as 1 value for these joints. So, a total of 4*3 + 3 = 15 time series. So, there are 15 variables. How do I convert them to represent the trajectories shown below and then apply clustering to cluster trajectories. The clusters will allow in classification.
Can somebody please show how to obtain the diagram similar to the one attached? and how do I fuse and convert the 15 time series from each person into a single trajectory.
The picture illustrates the number of clusters that are generated for the time series. Thank you in advance.
K-means is a bad fit for trajectories.
It needs to be able to compute the mean (which is why it is called "k-means"). Having a stable, sensible mean is important. But how meaningful is the mean of some time series, even if you could define it (and the series weren't e.g. of different length, and different movement speed)?
Try hierarchical clustering, and multivariate dynamic time warping.

Finding elongated clusters using MATLAB

Let me explain what I'm trying to do.
I have plot of an Image's points/pixels in the RGB space.
What I am trying to do is find elongated clusters in this space. I'm fairly new to clustering techniques and maybe I'm not doing things correctly, I'm trying to cluster using MATLAB's inbuilt k-means clustering but it appears as if that is not the best approach in this case.
What I need to do is find "color clusters".
This is what I get after applying K-means on an image.
This is how it should look like:
for an image like this:
Can someone tell me where I'm going wrong, and what I can to do improve my results?
Note: Sorry for the low-res images, these are the best I have.
Are you trying to replicate the results of this paper? I would say just do what they did.
However, I will add since there are some issues with the current answers.
1) Yes, your clusters are not spherical- which is an assumption k-means makes. DBSCAN and MeanShift are two more common methods for handling such data, as they can handle non spherical data. However, your data appears to have one large central clump that spreads outwards in a few finite directions.
For DBSCAN, this means it will put everything into one cluster, or everything is its own cluster. As DBSCAN has the assumption of uniform density and requires that clusters be separated by some margin.
MeanShift will likely have difficulty because everything seems to be coming from one central lump - so that will be the area of highest density that the points will shift toward, and converge to one large cluster.
My advice would be to change color spaces. RGB has issues, and it the assumptions most algorithms make will probably not hold up well under it. What clustering algorithm you should be using will then likely change in the different feature space, but hopefully it will make the problem easier to handle.
k-means basically assumes clusters are approximately spherical. In your case they are definitely NOT. Try fit a Gaussian to each cluster with non-spherical covariance matrix.
Basically, you will be following the same expectation-maximization (EM) steps as in k-means with the only exception that you will be modeling and fitting the covariance matrix as well.
Here's an outline for the algorithm
init: assign each point at random to one of k clusters.
For each cluster estimate mean and covariance
For each point estimate its likelihood to belong to each cluster
note that this likelihood is based not only on the distance to the center (mean) but also on the shape of the cluster as it is encoded by the covariance matrix
repeat stages 2 and 3 until convergence or until exceeded pre-defined number of iterations
Take a look at density-based clustering algorithms, such as DBSCAN and MeanShift. If you are doing this for segmentation, you might want to add pixel coordinates to your vectors.

How to extract useful features from a graph?

Things are like this:
I have some graphs like the pictures above and I am trying to classify them to different kinds so the shape of a character can be recognized, and here is what I've done:
I apply a 2-D FFT to the graphs, so I can get the spectral analysis of these graphs. And here are some result:
S after 2-D FFT
T after 2-D FFT
I have found that the same letter share the same pattern of magnitude graph after FFT, and I want to use this feature to cluster these letters. But there is a problem: I want the features of interested can be presented in a 2-D plane, i.e in the form of (x,y), but the features here is actually a graph, with about 600*400 element, and I know the only thing I am interested is the shape of the graph(S is a dot in the middle, and T is like a cross). So what can I do to reduce the dimension of the magnitude graph?
I am not sure I am clear about my question here, but thanks in advance.
You can use dimensionality reduction methods such as
k-means clustering
SVM
PCA
MDS
Each of these methods can take 2-dimensional arrays, and work out the best coordinate frame to distinguish / represent etc your letters.
One way to start would be reducing your 240000 dimensional space to a 26-dimensional space using any of these methods.
This would give you an 'amplitude' for each of the possible letters.
But as #jucestain says, a network classifiers are great for letter recognition.