I am using MATLAB to build a prediction model which the target is binary.
The problem is that those negative observations in my training data may indeed are positives but are just not detected.
I started with a logistic regression model assuming the data is accurate and the results are less than satisfactory. After some research, I moved to one class learning hoping that I can focus on the only the part of data (the positives) that I am certain with.
I looked up the related materials from MATLAB documentation and found that I can use fitcsvm to proceed.
My current problem is:
Am I on the right path? Can one class learning solve my problem?
I tried to use fitcsvm to create a ClassificationSVM using all the positive observations that I have.
model = fitcsvm(Instance,Label,'KernelScale','auto','Standardize',true)
However, when I try to use the model to predict
[label,score] = predict(model,Test)
All the labels predicted for my Test cases are 1. I think I did something wrong. So should I feed the svm only the positive observations that I have?
If not what should I do?
Related
Hello problem solvers,
I am currently analyzing some fMRI data of a visual task and stuck with a problem. The idea was to train an SVM classifier on some data and then use the weight vector to project novel data onto it. However, the model struct created by fitcsvm (Matlab 2018a) only sometimes actually contains a weight vector. In many cases mdl.Betas is empty.
mdl = fitcsvm(training, labels)
weights = mdl.Beta
I looked into why that could be and made sure the input is double, is not just zeros and does not contain NaNs. So far, I have not been able to identify a rule as to why it sometimes returns empty and sometimes not. If anything it seems that as I increase the amount of input data, mdl.Betas is less frequently empty. But I can only change so much about that. :(
I am happy about any help!
Thanks!
Edit: Unfortunately, like so often with fMRI, training data is quite limited. The training data consists of 50 to 300 features (depending on brain area) and one example for class A and five examples for class B.
I'm approaching a 4 class classification problem, it's not particularly unbalanced, no missing features a lot of observation.. It seems everything good but when I approach the classification with fitcecoc it classifies everything as part of the first class. I try. to use fitclinear and fitcsvm on one vs all decomposed data but gaining the same results. Do you have any clue about the reason of that problem ?
Here are a few recommendations:
Have you normalized your data? SVM is sensitive to the features being
from different scales.
Save the mean and std you obtain during the training and use
those values during the prediction phase for normalizing the test
samples.
Change the C value and see if that changes the results.
I hope these help.
I have my training data with a binary column at the end, I need to run a classifier that gives me a numerical probability that it's correct. I've tried running it with the linear regression classifier and I get some minus numbers in the prediction column. I also tried it with the lazy iBk classifier but only got predictions (of 1) where the binary column was 1.
Yeah, not all algorithms provide that information. Naive Bayes is a standard for that sort of thing and works surprising well under many conditions.
I had great results with RandomCommittee where probabilities were a key factor in our application. Not only did we have to pick the top of the list, but we needed to know if there were any close runners-up. When this happened the application paused and asked the user for clarification (which was stored back in the database, of course.) RandomCommittee was chosen after extensive testing over time.
i wanna use SOM toolbox (http://www.cis.hut.fi/somtoolbox/theory/somalgorithm.shtml) for predicting missing values or outliers . but i can't find any function for it.
i wrote a code for visualizaition and getting BMU(Best maching unit) but i'don't know how to use it in prediction. could you help me?
thank you in advance .
If still interests you here goes one solution.
Train your network with a training set with all the inputs that you will further on analyze. After learning, you give the new test data to classify with only the inputs that you have. The network give you back which was the best matching unit (for the features you have), and with this you can access to which of the features you do not have/outliers the BMU corresponds to.
This of course leads to a different learning and prediction implementation. The learning you implement straightforward as suggested in many tutorials. The prediction you need to make the SOM ignore NaN and calculate the BMU based on only the other values. After that, with the BMU you can get the corresponding features and use that to predict missing values or outliers.
I created a classifier using libsvm toolbox in Matlab. It is classifying all positive class data as negative class and vice versa. I got good result while doing cross validation but while testing some data I am finding that classifier is working in wrong way. I can't seem to figure out where the problem lies.
Can anybody please help me on this matter.
This was a "feature" of prior versions of libsvm when the first training example's (binary) label was -1. The easiest solution is to get the latest version (> 3.17).
See here for more details: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f430
Suppose you have 500 training instances. 250 will be positive and other negative. Now in the testing set, the instances which have same features as positive will get predicted as positive. But when you are supplying testing labels (you have to provide testing labels so that LIBSVM can calculate accuracy, they won't obviously be used in the preiction algorithm) to LIBSVM, you have provided exactly reverse labels (by mistake). So you have a feeling that your predicted labels have come out exactly opposite. Because even a random classifier will have a 50% accuracy for a binary classification problem.