What do I actually Do with the winning neuron of a Self Organizing Map? - som

I've reviewed weeks worth of material and everyone stops at identifying the winning neuron and plotting the SOM.
However, how do I actually use this output?

Unsupervised SOMs are mainly used for clustering and visualization.
Clustering: Find datapoints which have their winning neuron close to each other. These datapoints share similarities and might belong to the same cluster or class.
Visualization: Plot the SOM grid and try to include the information about the winning neuron per datapoint into it. Examples are the World Poverty Map, RGB clustering, and this blobs example.

Related

Self-organizing map: How to identify clusters from plots?

I've been learning about neural networks and most recently been trying out different clustering methods. But unlike KNN, GMM, or DBSCAN, there isn't a feature (in Matlab that I'm aware of) that identifies clusters for you. So I've been reading articles of how to interpret these plots, but I'm still confused. For my example, in the weight positions plot, I see one cluster. For the neighbor weight differences, I see one, maybe two clusters (yellow/bright - similar, red/dark - dissimilar). That seems to be confirmed when looking at the densities in the hits plot. There might be more, but I honestly I can't tell (I'm new at this) because of the gradient instead of a solid boundary between clusters. How many clusters do you see, and what's your logic? Thank you]1[]2[]3
selforgmap([5 5]
[net,tr] = train(net,x)
figure, plotsomnd(net)
figure, plotsomhits(net,x)
figure, plotsompos(net,x)
You may construct a new paradigm in relation with what the SOM nodes represent, i.e. they produce a new dataset. The new dataset is independent from the original dateset. Nevertheles, it is arranged somehow so that the underlying structure imitates that of the original dataset. Therefore, it is often found that people perform SOM with clustering algorithms such as K-means, Hierarchical Clustering, etc subsequently. This can be regarded as: instead of clustering directly from a huge amount of the original data, the clustering procedure is performed on a new version of the original dataset which is smaller but still inherits the topology of the original dataset. AFAIK, SOM is different from KNN in the sense that SOM is unsupervised whereas KNN is supervised.

Explain how is heat map used in CNN for crowd count case?

im new to neural netwworks, and through the readings i often come across where heat maps are used in the network along with the ground truth provided with the dataset to (as far as my understanding is) evaluate the accuracy of network performance.
to be specific, consider the application of a crowd density estimation network, the dataset provide the crowd images, each image hase a corresponding ground truth .mat file, this file has:
a matrix of X and Y coordinations representing the appearanse of human head in the image.
the total number of human heads in the image (crowd count) which is equal to the matrix rows.
my current understanding is that one image will get through the network and the result is a going to be compared with the given ground-truth (either the head locations, ore the crowd count),
SO, how and at which point and is the crowd density map or heat map is used? do we generate one for the image while training an compare it with the one generated from the ground truth? how is this done?
all the papers i've read neglect describing this process.
any clear explanation will be appreciated.
The counting-by-density CNN approache, which you describe are fully convolutional regression network where the output is a density (heat map). That mean the training ground-truth data is a density map too. In general a MSE loss is used for training. The ground-truth density maps are in general precomputed from the head detection ground-truth.
E.g. if you look at the MCNN code preparation folder you will find get_density_map_gaussian.m file which performs the density estimation from the ground-truth annotated head.

Visualizing clusters using TSNE

I have a dataset which I need to cluster and display in a way wherein elements in the same cluster should appear closer together. The dataset is based out of a research study, and has around 16 rows(entries) and about 50 features. I do agree that its not an ideal dataset to begin with, but unfortunately thats is the situation on hand.
Following is the approach I took:
I first applied KMeans on the dataset after normalizing it.
In parallel I also tried to use TSNE to map the data into 2 dimensions and plotted them on a scatterplot. From my understanding of TSNE, that technique should already be placing items in same clusters closer to each other. When I look at the scatterplot, however, the clusters are really all over the place.
The result of the scatterplot can be found here: https://imgur.com/ZPhPjHB
Is this because TSNE and KMeans intrinsically work differently? Should I just do TSNE and try to label the clusters (and if so, how?) or should I be using TSNE output to feed into KMeans somehow?
I am really new in this space and advice would be greatly appreciated!
Thanks in advance once again
Edit: The same overlap happens if I first use TSNE to reduce dimensions to 2 and then use those reduced dimensions to cluster using KMeans
There is a difference between TSNE and KMeans. TSNE is used for visualization mostly and it tries to project points on the 2D/3D space (from bigger spaces) in order to keep distances (if in the bigger space 2 points were far away TSNE will try to show it).
So TSNE is not a real clustering. And that's why results you got that strange scatter plot.
For TSNE sometimes you need to apply PCA before but that is needed if your number of features is big. Just to speed-up calculations.
As already advised, try to use hierarchical clustering or simply generate more rows.
Apply tSNE and fit k-means is one of the basic things you can start from.
I would say consider using different f-divergence.
Stochastic Neighbor Embedding under f-divergences https://arxiv.org/pdf/1811.01247.pdf
This paper tries five different f- divergence functions : KL, RKL, JS, CH (Chi-Square), HL (Hellinger).
The paper goes over which divergence emphasize what in terms of precision and recall.

Comparing k-means clustering

I have 150 images, 15 each of 10 different people. So basically I know which image should belong together, if clustered.
These images are of 73 dimensions (feature-vector) and I clustered them into 10 clusters using kmeans function in matlab.
Later, I processed these 150 data points and reduced its dimension from 73 to 3 for my work and applied the same kmeans function on them.
I want to compare the results obtained on these data sets (processed and unprocessed) by applying the same k-means function and wish to know if the processing which reduced it to lower dimension improves the kmeans clustering or not.
I thought comparing the variance of each cluster can be one parameter for comparison, however I am not sure if I can directly compare and evaluate my results (within cluster sum of distances etc.) as both the cases are of different dimension. Could anyone please suggest a way where I can compare the kmean results, some way to normalize them or any other comparison that I can make?
I can think of three options. I am unaware of any well developed methodology to do this specifically with K-means clustering.
Look at the confusion matrices between the two approaches.
Compare the mahalanobis distances between the clusters, and between items in clusters to their nearest other clusters.
Look at the Vornoi cells and see how far your points are from the boundaries of the cells.
The problem with 3, is the distance metrics get skewed, 3D distance vs. 73D distances are not commensurate, so I'm not a fan of that approach. I'd recommend reading some books on K-means if you are adamant of that path, rank speculation is fun, but standing on the shoulders of giants is better.

Matlab: K-means clustering with predefined populations

I am trying to differentiate two populations. Each population is an NxM matrix in which N is fixed between the two and M is variable in length (N=column specific attributes of each run, M=run number). I have looked at PCA and K-means for differentiating the two, but I was curious of the best practice.
To my knowledge, in K-means, there is no initial 'calibration' in which the clusters are chosen such that known bimodal populations can be differentiated. It simply minimizes the distance and assigns the data to an arbitrary number of populations. I would like to tell the clustering algorithm that I want the best fit in which the two populations are separated. I can then use the fit I get from the initial clustering on future datasets. Any help, example code, or reading material would be appreciated.
-R
K-means and PCA are typically used in unsupervised learning problems, i.e. problems where you have a single batch of data and want to find some easier way to describe it. In principle, you could run K-means (with K=2) on your data, and then evaluate the degree to which your two classes of data match up with the data clusters found by this algorithm (note: you may want multiple starts).
It sounds to like you have a supervised learning problem: you have a training data set which has already been partitioned into two classes. In this case k-nearest neighbors (as mentioned by #amas) is probably the approach most like k-means; however Support Vector Machines can also be an attractive approach.
I frequently refer to The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition (Springer Series in Statistics) by Trevor Hastie (Author), Robert Tibshirani (Author), Jerome Friedman (Author).
It really depends on the data. But just to let you know K-means does get stuck at local minima so if you wanna use it try running it from different random starting points. PCA's might also be useful how ever like any other spectral clustering method you have much less control over the clustering procedure. I recommend that you cluster the data using k-means with multiple random starting points and c how it works then you can predict and learn for each the new samples with K-NN (I don't know if it is useful for your case).
Check Lazy learners and K-NN for prediction.