my model is predicting all positive class to negative class in libsvm toolbox matlab - matlab

I created a classifier using libsvm toolbox in Matlab. It is classifying all positive class data as negative class and vice versa. I got good result while doing cross validation but while testing some data I am finding that classifier is working in wrong way. I can't seem to figure out where the problem lies.
Can anybody please help me on this matter.

This was a "feature" of prior versions of libsvm when the first training example's (binary) label was -1. The easiest solution is to get the latest version (> 3.17).
See here for more details: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f430

Suppose you have 500 training instances. 250 will be positive and other negative. Now in the testing set, the instances which have same features as positive will get predicted as positive. But when you are supplying testing labels (you have to provide testing labels so that LIBSVM can calculate accuracy, they won't obviously be used in the preiction algorithm) to LIBSVM, you have provided exactly reverse labels (by mistake). So you have a feeling that your predicted labels have come out exactly opposite. Because even a random classifier will have a 50% accuracy for a binary classification problem.

Related

SVM Matlab classification

I'm approaching a 4 class classification problem, it's not particularly unbalanced, no missing features a lot of observation.. It seems everything good but when I approach the classification with fitcecoc it classifies everything as part of the first class. I try. to use fitclinear and fitcsvm on one vs all decomposed data but gaining the same results. Do you have any clue about the reason of that problem ?
Here are a few recommendations:
Have you normalized your data? SVM is sensitive to the features being
from different scales.
Save the mean and std you obtain during the training and use
those values during the prediction phase for normalizing the test
samples.
Change the C value and see if that changes the results.
I hope these help.

One class learning to make predictions using MATLAB

I am using MATLAB to build a prediction model which the target is binary.
The problem is that those negative observations in my training data may indeed are positives but are just not detected.
I started with a logistic regression model assuming the data is accurate and the results are less than satisfactory. After some research, I moved to one class learning hoping that I can focus on the only the part of data (the positives) that I am certain with.
I looked up the related materials from MATLAB documentation and found that I can use fitcsvm to proceed.
My current problem is:
Am I on the right path? Can one class learning solve my problem?
I tried to use fitcsvm to create a ClassificationSVM using all the positive observations that I have.
model = fitcsvm(Instance,Label,'KernelScale','auto','Standardize',true)
However, when I try to use the model to predict
[label,score] = predict(model,Test)
All the labels predicted for my Test cases are 1. I think I did something wrong. So should I feed the svm only the positive observations that I have?
If not what should I do?

Matlab fitcsvm Feature Coefficients

I'm running a series of SVM classifiers for a binary classification problem, and am getting very nice results as far as classification accuracy.
The next step of my analysis is to understand how the different features contribute to the classification. According to the documentation, Matlab's fitcsvm function returns a class, SVMModel, which has a field called "Beta", defined as:
Numeric vector of trained classifier coefficients from the primal linear problem. Beta has length equal to the number of predictors (i.e., size(SVMModel.X,2)).
I'm not quite sure how to interpret these values. I assume higher values represent a greater contribution of a given feature to the support vector? What do negative weights mean? Are these weights somehow analogous to beta parameters in a linear regression model?
Thanks for any help and suggestions.
----UPDATE 3/5/15----
In looking closer at the equations describing the linear SVM, I'm pretty sure Beta must correspond to w in the primal form.
The only other parameter is b, which is just the offset.
Given that, and given this explanation, it seems that taking the square or absolute value of the coefficients provides a metric of relative importance of each feature.
As I understand it, this interpretation only holds for the linear binary SVM problem.
Does that all seem reasonable to people?
Intuitively, one can think of the absolute value of a feature weight as a measure of it's importance. However, this is not true in the general case because the weights symbolize how much a marginal change in the feature value would affect the output, which means that it is dependent on the feature's scale. For instance, if we have a feature for "age" that is measured in years, but than we change it to months, the corresponding coefficient will be divided by 12, but clearly,it doesn't mean that the age is less important now!
The solution is to scale the data (which is usually a good practice anyway).
If the data is scaled your intuition is correct and in fact, there is a feature selection method that does just that: choosing the features with the highest absolute weight. See http://jmlr.csail.mit.edu/proceedings/papers/v3/chang08a/chang08a.pdf
Note that this is correct only to linear SVM.

One Class SVM using LibSVM in Matlab - Conceptual

Perhaps this is an easy question, but I want to make sure I understand the conceptual basis of the LibSVM implementation of one-class SVMs and if what I am doing is permissible.
I am using one class SVMs in this case for outlier detection and removal. This is used in the context of a greater time series prediction model as a data preprocessing step. That said, I have a Y vector (which is the quantity we are trying to predict and is continuous, not class labels) and an X matrix (continuous features used to predict). Since I want to detect outliers in the data early in the preprocessing step, I have yet to normalize or lag the X matrix for use in prediction, or for that matter detrend/remove noise/or otherwise process the Y vector (which is already scaled to within [-1,1]). My main question is whether it is correct to model the one class SVM like so (using libSVM):
svmod = svmtrain(ones(size(Y,1),1),Y,'-s 2 -t 2 -g 0.00001 -n 0.01');
[od,~,~] = svmpredict(ones(size(Y,1),1),Y,svmod);
The resulting model does yield performance somewhat in line with what I would expect (99% or so prediction accuracy, meaning 1% of the observations are outliers). But why I ask is because in other questions regarding one class SVMs, people appear to be using their X matrices where I use Y. Thanks for your help.
What you are doing here is nothing more than a fancy range check. If you are not willing to use X to find outliers in Y (even though you really should), it would be a lot simpler and better to just check the distribution of Y to find outliers instead of this improvised SVM solution (for example remove the upper and lower 0.5-percentiles from Y).
In reality, this is probably not even close to what you really want to do. With this setup you are rejecting Y values as outliers without considering any context (e.g. X). Why are you using RBF and how did you come up with that specific value for gamma? A kernel is total overkill for one-dimensional data.
Secondly, you are training and testing on the same data (Y). A kitten dies every time this happens. One-class SVM attempts to build a model which recognizes the training data, it should not be used on the same data it was built with. Please, think of the kittens.
Additionally, note that the nu parameter of one-class SVM controls the amount of outliers the classifier will accept. This is explained in the LIBSVM implementation document (page 4): It is proved that nu is an upper bound on the fraction of training errors and
a lower bound of the fraction of support vectors. In other words: your training options specifically state that up to 1% of the data can be rejected. For one-class SVM, replace can by should.
So when you say that the resulting model does yield performance somewhat in line with what I would expect ... ofcourse it does, by definition. Since you have set nu=0.01, 1% of the data is rejected by the model and thus flagged as an outlier.

Parameters for Support Vector Regression using LibSVM in Matlab

I am trying to use LibSVM for regression. I am trying to detect faces (10 classes of different faces). I labeled 1-10 as face class and 11 is for non face. I want to develop a script usig LibSVM which will give me a continuous score between 0-1 if the test image falls to any of the 10 face class, otherwise it will give me -1 (non-face). From this score, I can be able to predict my calss. If the test image is matched with 1st class, the score should be around .1. Similarly if the test image is matched with class 10, the score should be around 1 (any continuous value close to 1). I am trying to use SVR using LibSVM to solve this problem. I can easily get the predicted class through classification. But I want a continuous score value which I can get through regression. Now, I was looking in the net for the function or parameters in function for SVR using LibSVM, but I couldn't find anything. Can anybody please help me in this regard?
This is not a regression problem. Solving it through regression will not yield good results.
You are dealing with a multiclass classification problem. The best way to approach this is to construct 10 one-vs-all classifiers with probabilistic output. To get probabilistic output (e.g. in the interval [0,1]), you can train and predict with the -b 1 option for C-SVC (-s 0).
If any of the 10 classifiers yields a sufficiently large probability for its positive class, you use that probability (which is close to 1). If none of the 10 classifiers yield a positive label with high enough confidence you can default to -1.
So in short: make a multiclass classifier containing one-vs-all classifiers with probabilistic output. Subsequently post-process the predictions as I described, using a probability threshold of your choice (for example 0.7).