k nearest neighbour is a supervised algorithm. It is suitable to classify high dimensionality data. Could someone please mention a few unsupervised algorithms used to classify high dimensionality data items?
Related
I'm new to machine learning, I need advice on what kind of classification algorithms and techniques can be used to deal with high dimension datasets (129 rows and 1900 columns) for classification problem.
Algorithms will remain same as any other classification but you can do the try following:
Do dimensionality reduction using PCA to reduce dimensions
Use forward or backward selection algorithms
Remove highly correlated variable
Use L1 regularisation with high alpha value as it does features selection intinsically
my aim is to classify the data into two sections- upper and lower- finding the mid line of the peaks.
I would like to apply machine learning methods- i.e. Discriminant analysis.
Could you let me know how to do that in MATLAB?
It seems that what you are looking for is GMM (gaussian mixture model). With K=2 (number of mixtures) and dimension equal 1 this will be simple, fast method, which will give you a direct solution. Given components it is easy to analytically find a local minima (which is just a weighted average of means, with weights proportional to the std's).
I am solving a classification problem. I train my unsupervised neural network for a set of entities (using skip-gram architecture).
The way I evaluate is to search k nearest neighbours for each point in validation data, from training data. I take weighted sum (weights based on distance) of labels of nearest neighbours and use that score of each point of validation data.
Observation - As I increase the number of epochs (model1 - 600 epochs, model 2- 1400 epochs and model 3 - 2000 epochs), my AUC improves at smaller values of k but saturates at the similar values.
What could be a possible explanation of this behaviour?
[Reposted from CrossValidated]
To cross check if imbalanced classes are an issue, try fitting a SVM model. If that gives a better classification(possible if your ANN is not very deep) it may be concluded that classes should be balanced first.
Also, try some kernel functions to check if this transformation makes data linearly separable?
I am finding it difficult to understand the difference between Self Organizing Maps and neural gas. I read the Wikipedia article and Neural Gas Network Learns topologies article.
The SOM algorithm and Neural Gas algorithm looks so similar. In both it finds the winning neuron and the winning neuron fires and the firing neuron excites the neighbourhood neurons where the neighbourhood is detrmined by a neighbourhood function. In Neural gas the weights are adjusted as
and in SOM weights are adjusted as
.
They both are the same right?
The SOM algorithm is
and the neural gas algorithm is
What is the difference between the two algorithms?
In the article it says
I don't understand what is meant by this. Can some one please help me to understand this.
SOM uses a set of neurons which are arranged in a predefined structure. In SOM neighborhood is defined based on this structure.This picture shows an example of this structure.
SOM two dimentional lattice
But Neural Gas (NG) defines the neighborhood based on the distance of neurons in the input(feature) space (No structure exists)
In other words, SOM does ordered vector quantization where as NG does unordered vector quantization.
It's something like this: In SOM neurons are labeled with numbers at the beginning for example 1,2,3 and so on. the neighborhood is based on this numbers. for example when 1 is the BMU. 2 is a neighboring neuron.
In NG when a neuron is selected as BMU. the neurons that have closest weight vectors to BMU are selected as neighbors.
What is the k nearest neighbour regression function in Matlab? Is only knn classification function available? Is anybody knowing any useful literature regarding to that?
Regards
Farideh
I don't believe the k-NN regression algorithm is directly implemented in matlab, but if you do some googling you can find some valid implementations. The algorithm is fairly simple though.
Find the k-Nearest elements using whatever distance metric is suitable.
Convert the inverse distance weight of each of the k elements
Compute weighted mean of the k elements using the inverse distance weight.