Parameter selection of SVM - matlab

I have a dataset which I use for classifcation with libSVM in Matlab. The dataset consists of 4 classes.
For parameter selection of SVM I can do nested cross-validation. The problem is that I also need the value of the best parameters in the end.
After having done the nested cross-validation and having the final accuracy I want the values of the best parameters. Then I will train a SVM for each class (one-vs-all) with the best parameters for selecting the most important features (according to heighest weight), i.e. feature importance map.
How can I do this? Should I just not do nested cross-validation and only looping over all parameters and doing cross-validation?
Second, if I use a linear SVM then using this weight vector w for assigning importance to features works, but does it also work for non-linear SVM (e.g. rbf kernel)?

To find the "best" parameters for your kernel of choice, you have to loop through all parameters to perform a so called "grid search". LIBSVM does not support a build-in grid-search mechanismn.
Regarding your second question, I would suggest to perform a feature selection (e.g. Information Gain, Mutual Information, ...) as a pre-processing step before the actual work with the SVM and in a second step take the weight vector
s into consideration (but I am not sure, if this will work with RBF or Gaußian Kernels...).

Related

Auto-encoder based unsupervised clustering

I am trying to cluster a dataset using an encoder and since I am new in this field I cant tell how to do it.My main issue is how to define the loss function since the dataset is unlabeled and up to know, what I have seen from bibliography they define as loss function the distance between the desired output and the predicted output.My question is since that I dont have a desired output how should I implement this?
You can use an auto encoder to pre-train your convolutional layers, like it described in my question here with usage of convolutional autoencoder for images
As you can see form code, loss function is Adam with metrics accuracy and dice coefficient, I think you can use accuracy only, since dice coefficient is image-specific
I’m not sure how it will work for you, because you hadn’t provided your idea how you will transform your bibliography lists to vector, perhaps you will create a list for bibliography id’s sorted by the cosine distance between them
For example, you can use a set of vector with cosine distances to each item in a bibliography list above for each reference in your dataset and use it as input for autoencoder
After encoder will be trained, you can remove the decoder part from your model output and use as an input for one of unsupervised clustering algorithms, for example, k-mean. You can find details about them here

How do i identify which features are being selected with LDA?

I have run LDA with MATLAB using the fitcdiscr function and predict.
I have a feeling there may be some bugs in my code however and as a sanity check would like to identify which features are being most heavily weighted in the classification.
Can this be done?
There is a Coeffs field in your fitted object containing all the relevant information http://uk.mathworks.com/help/stats/classificationdiscriminant-class.html
In particular, if you fit a linear LDA there will be Linear field which is the linear operator used for projection. However, one should bear in mind that value of coefficients of linear models are not feature importances. There is much more in that to consider. Weight can be big because your feature have small values or because there is a highly biased distribution of the values. If you need feature selection technique - use feature selection methods (like L1 regularized models) otherwise you might easily get wrong conclusions from your data.

Self organizing Maps and Linear vector quantization

Self organizing maps are more suited for clustering(dimension reduction) rather than classification. But SOM's are used in Linear vector quantization for fine tuning. But LVQ is a supervised leaning method. So to use SOM's in LVQ, LVQ should be provided with a labelled training data set. But since SOM's only do clustering and not classification and thus cannot have labelled data how can SOM be used as an input for LVQ?
Does LVQ fine tune the clusters in SOM?
Before using in LVQ should SOM be put through another classification algorithm so that it can classify the inputs so that these labelled inputs maybe used in LVQ?
It must be clear that supervised differs from unsupervised because in the first the target values are known.
Therefore, the output of supervised models is a prediction.
Instead, the output of unsupervised models is a label for which we don't know the meaning yet. For this purpose, after clustering, it is necessary to do the profiling of each one of those new label.
Having said so, you could label the dataset using an unsupervised learning technique such as SOM. Then, you should profile each class in order to be sure to understand the meaning of each class.
At this point, you can pursue two different path depending on what is your final objective:
1. use this new variable as a way for dimensionality reduction
2. use this new dataset featured with the additional variable representing the class as a labelled data that you will try to predict using the LVQ
Hope this can be useful!

How to see which Atribute (Feature) contribute most to the performance of the classification with PCA in Matlab?

I would like to perform classification on a small data set 65x9 using some of the Machine Learning Classification Methods (SVM, Decision Trees or any other).
So, before starting with the classification I would like to do attribute analyses with PCA in Matlab or Weka (preferred MatLab). I would like to obtain which Attribute contribute most to the performance of the classifier. So I can maybe reduce the number of some Attribute or/and include more in the future. Any example of PCA can find regarding this in MatLab or Weka?
Thanks
PCA is a unsupervised feature extraction method.
If your question is on selecting attributes to use with PCA, i don't know what your purpose is but it is unnecessary to do something like that to improve classification performance. Just use the whole attributes. PCA will give you best attributes in decreasing order for each instance.
If your question is on selecting attributes after PCA, you can chose a treshold (for example 0.95) and calculate #attributes enough for treshold beginning from the first attribute to last one. You can use the eigenvalues of covariance matrix to calculate and achive treshold in PCA.
After running PCA, we know that the first attribute is the best one, the second attribute is the best one after first etc...

Genetic Algorithm After SVM

I have already applied SVM using LIBSVM. Now i would like to implement Genetic Algorithm for feature selection. Tried to google for some information
1) Saw this website : http://www.scribd.com/doc/31235552/Genetic-Algorithm-Implementation-Using-Matlab
2) GA Examples in MATLAB : http://www.mathworks.com/help/toolbox/gads/f6691.html
Have few questions on them
Q1) [x fval] = ga(#fitnessfun, nvars, options). This is the function to do gasolver. What should be the fitnessfun? In most ga, it is a polynomial function. But in the case of SVM, what shld be the fitnessfun?
Q2) is there any concrete examples for GA after SVM?
Like to hear some feedback.
Thanks in advance.
If you want to do feature selection, I think you have it backwards. You should run the GA for feature selection before the training of your SVM. Your fitness function could become the performance of a newly trained SVM on selected features, it depends on what you want to accomplish. Can't say you were very clear on this topic.
To answer your second comment:
There are many parts, I don't know this ga function you are using, but if you take a look at the documentation they must tell you somewhere what parameters this fitnessfun should be expecting. I'm guessing the individual for which you want to evaluate fitness is the main parameter for this function. If you evolve a selection of features, this individual would be an array of Boolean variables where true indicates a feature that is selected an false indicates a feature that is not selected. This fitness function needs to return an indicator of how well this selection of features fares, i.e. it must return a higher number for a better selection, and a lower number for a worst selection. Prediction accuracy might be a good value for this (nb. of correct predictions divided by the total number of samples).
I'm going to assume you know how to calculate the prediction accuracy of an SVM model given a dataset and its labels. Since you have a pre-trained SVM it might be a bit tricky to use it only for selected features, and it depends strongly upon the implementation of your SVM. If it is a linear SVM, you could just set the values of the non-selected features to zero in the data matrix. However, if it is an RBF SVM that won't work. You will need to understand the inner mechanisms of the SVM implementation you are relying on. I suggest making a simple example where you train an SVM on 3d data and then adapt it to work on 2d data. It strongly depends on the implementation of your SVM model.