I am studying Self Organized Map (SOM) in the Neuron Nets field. So I have 2 questions:
1) Why neighborhood size is decreasing?
2) why not update just the winner? What would happen in this case?
Thanks in advance
The power of the SOM is to create a neural network that will be displayable and human readable, so:
1) The neighborhood size is decreased in order to get some stability in the algorithm increasing the iterations.
2) The mean of updating also the neighborhood is to create the map (that will be displayable) where near units have similar weights. If you update only the winner unit, the map will be not created since similar units will be left scattered in the map.
Related
i'm working on my thesis and i used a Catboost classifier to perform a binary analysis on a very unbalanced dataset:
class0 = x number of samples
class1 = 10*x number of samples
In order to optimize the performance of the model i changed the weights of the classes, giving an higher weight to the minority class, and then i performed a grid search cross validation in which it is searched the set of hyperparameters that reduces the crossentropy loss associated to the catboost model.
At this point i also changed the classificaiton threshold by maximizing the G-mean metric (sqruare root of sensitivity multiplied by specificity).
In you opinion, if you are experts or informed about ensemble methods of type boosting, is it right to procede in this way to increase the performance of the model when the dataset is unbalanced? Maybe it would be enough just to change the weights and use the grid search instead of changing also the classification threshold?
Thank you in advance!
Given a feature map of dimensionality MxNxC (for example, the output of a predicted Region of Interest from a Faster-RCNN), how would one reduce the spatial dimensions to be 1x1xC? I.e. reduce the feature map to be a vector like quantity summarizing the features of the region?
I am aware of the 1x1 Convolution, however this seems to be relevant in the channel reduction case. Average and Max Pooling also are commonly used, however it seems that these approaches are better suited to a less extreme subsampling case.
Obviously one may simply compute the mean over the spatial dimensions, however this seems rather coarse.
I recommend using of Global average pooling layer. You have MxNxC feature maps. Gloabal average pooling compute average for every feature map. So feature map becomes one number and set of features map becomes vector.
I recommend this article as starting point to exploring global average pooling layer.
https://alexisbcook.github.io/2017/global-average-pooling-layers-for-object-localization/
im new to neural netwworks, and through the readings i often come across where heat maps are used in the network along with the ground truth provided with the dataset to (as far as my understanding is) evaluate the accuracy of network performance.
to be specific, consider the application of a crowd density estimation network, the dataset provide the crowd images, each image hase a corresponding ground truth .mat file, this file has:
a matrix of X and Y coordinations representing the appearanse of human head in the image.
the total number of human heads in the image (crowd count) which is equal to the matrix rows.
my current understanding is that one image will get through the network and the result is a going to be compared with the given ground-truth (either the head locations, ore the crowd count),
SO, how and at which point and is the crowd density map or heat map is used? do we generate one for the image while training an compare it with the one generated from the ground truth? how is this done?
all the papers i've read neglect describing this process.
any clear explanation will be appreciated.
The counting-by-density CNN approache, which you describe are fully convolutional regression network where the output is a density (heat map). That mean the training ground-truth data is a density map too. In general a MSE loss is used for training. The ground-truth density maps are in general precomputed from the head detection ground-truth.
E.g. if you look at the MCNN code preparation folder you will find get_density_map_gaussian.m file which performs the density estimation from the ground-truth annotated head.
I have a question on self-organizing maps:
But first, here is my approach on implementing one:
The som neurons are stored in a basic array. Each neuron consists of a vector (another array of the size of the input neurons) of double values which are initialized to a random value.
As far as I understand the algorithm, this is actually all I need to implement it.
So, for the training I choose a sample of the training data at random an calculate the BMU using the Euclidian distance of sample's values and the neuron weights.
Afterwards I update it's weights and all other neurons in it's range depending on the neighborhood function and the learning rate.
Then, I decrease the neighborhood function and the learning rate.
This is done until a fixed amount of iterations.
My question is now: How do I determine the clusters after the training? My approach so far is to present a new input vector and calculate the min Euclidian distance between it and the BMU . But this seems a little naive to me. I'm sure that I've missed something.
There is no single correct way of doing that. As you noted, finding the BMU is one of them and the only one that makes sense if you just want to find the most similar cluster.
If you want to reconstruct your input vector, returning the BMU prototype works too, but may not be very precise (it is equivalent to the Nearest Neighbor rule or 1NN). Then you need to interpolate between neurons to find a better reconstruction. This could be done by weighting each neuron inversely proportional to their distance to the input vector and then computing the weighted average (this is equivalent to weighted KNN). You can also restrict this interpolation only to the BMU's neighbors, which will work faster and may give better results (this would be weighted 5NN). This technique was used here: The Continuous Interpolating Self-organizing Map.
You can see and experiment with those different options here: http://www.inf.ufrgs.br/~rcpinto/itm/ (not a SOM, but a close cousin). Click "Apply" to do regression on a curve using the reconstructed vectors, then check "Draw Regression" and try the different options.
BTW, the description of your implementation is correct.
A pretty common approach nowadays is the soft subspace clustering, where feature weights are added to find the most relevant features. You can use these weights to increase performance and improve the BMU calculation with euclidean distance.
I have big data set (time-series, about 50 parameters/values). I want to use Kohonen network to group similar data rows. I've read some about Kohonen neural networks, i understand idea of Kohonen network, but:
I don't know how to implement Kohonen with so many dimensions. I found example on CodeProject, but only with 2 or 3 dimensional input vector. When i have 50 parameters - shall i create 50 weights in my neurons?
I don't know how to update weights of winning neuron (how to calculate new weights?).
My english is not perfect and I don't understand everything I read about Kohonen network, especially descriptions of variables in formulas, thats why im asking.
One should distinguish the dimensionality of the map, which is usually low (e.g. 2 in the common case of a rectangular grid) and the dimensionality of the reference vectors which can be arbitrarily high without problems.
Look at http://www.psychology.mcmaster.ca/4i03/demos/competitive-demo.html for a nice example with 49-dimensional input vectors (7x7 pixel images). The Kohonen map in this case has the form of a one-dimensional ring of 8 units.
See also http://www.demogng.de for a java simulator for various Kohonen-like networks including ring-shaped ones like the one at McMasters. The reference vectors, however, are all 2-dimensional, but only for easier display. They could have arbitrary high dimensions without any change in the algorithms.
Yes, you would need 50 neurons. However, these types of networks are usually low dimensional as described in this self-organizing map article. I have never seen them use more than a few inputs.
You have to use an update formula. From the same article: Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s))
yes, you'll need 50 inputs for each neuron
you basically do a linear interpolation between the neurons and the target (input) neuron, and use W(s + 1) = W(s) + Θ() * α(s) * (Input(t) - W(s)) with Θ being your neighbourhood function.
and you should update all your neurons, not only the winner
which function you use as a neighbourhood function depends on your actual problem.
a common property of such a function is that it has a value 1 when i=k and falls off with the distance euclidian distance. additionally it shrinks with time (in order to localize clusters).
simple neighbourhood functions include linear interpolation (up to a "maximum distance") or a gaussian function