I am trying to use LibSVM for regression. I am trying to detect faces (10 classes of different faces). I labeled 1-10 as face class and 11 is for non face. I want to develop a script usig LibSVM which will give me a continuous score between 0-1 if the test image falls to any of the 10 face class, otherwise it will give me -1 (non-face). From this score, I can be able to predict my calss. If the test image is matched with 1st class, the score should be around .1. Similarly if the test image is matched with class 10, the score should be around 1 (any continuous value close to 1). I am trying to use SVR using LibSVM to solve this problem. I can easily get the predicted class through classification. But I want a continuous score value which I can get through regression. Now, I was looking in the net for the function or parameters in function for SVR using LibSVM, but I couldn't find anything. Can anybody please help me in this regard?
This is not a regression problem. Solving it through regression will not yield good results.
You are dealing with a multiclass classification problem. The best way to approach this is to construct 10 one-vs-all classifiers with probabilistic output. To get probabilistic output (e.g. in the interval [0,1]), you can train and predict with the -b 1 option for C-SVC (-s 0).
If any of the 10 classifiers yields a sufficiently large probability for its positive class, you use that probability (which is close to 1). If none of the 10 classifiers yield a positive label with high enough confidence you can default to -1.
So in short: make a multiclass classifier containing one-vs-all classifiers with probabilistic output. Subsequently post-process the predictions as I described, using a probability threshold of your choice (for example 0.7).
Related
Why is h2o.randomforest calculating MSE on Out of bag sample and while training for a multinomail classification problem?
I have done binary classification also using h2o.randomforest, there it used to calculate AUC on out of bag sample and while training but for multi classification random forest is calculating MSE which seems suspicious. Please see this screenshot.
My target variable was a factor containing 4 factor levels model1, model2, model3 and model4. In the screenshot you would also a confusion matrix for these factors.
Can someone please explain this behaviour?
Both binomial and multinomial classification display MSE, so you will see it in the Scoring History table for both models (highlighted training_MSE column).
H2O does not evaluate a multinomial AUC. A few evaluation methods exist, but there is not yet a single widely adopted method. The pROC package discusses the method of Hand and Till, but mentions that it cannot be plotted and results rarely tested. Log loss and classification error are still available, specific to classification, as each has standard methods of evaluation in a multinomial context.
There is a confusion matrix comparing your 4 factor levels, as you highlighted. Can you clarify what more you are expecting? If you were looking for four individual confusion matrices, the four-column table contains enough information that they could be computed.
I am trying to differentiate between two classes of data for forecasting. Basically the dependent variables are features of a signal that I want to forecast. I want to predict whether the signal will have a positive or negative slope in the near future (1 time step ahead). I have tried with different time series analysis, such as Fourier analysis, fitting using neural networks, auto-regressive models, and classification with neural nets (using patternet in Matlab).
The function is continuous, so the most logical assumption is to use some regression analysis tool to determine what's going to happen. However, since I only care whether the slope is going to positive or negative, I changed the signal to a binary signal (1 if the slope is positive, -1 if the slope is 0 or negative).
This is by the far the best results I have gotten! However, for some unknown reason a neural net designed for classification did not work (the confusion matrix stated that there was a precision of around 50%). So I decided to try with a regular feedforward neural net...
Since the neural network outputs continuous data, I didn't know what to do... But then I remembered about Logistic regression, and since its transfer function is a log function (bounded by 0 and 1), it can be interpreted as a probability. So I basically did the same, defined a threshold (e.g above 0 is 1, below 0 is -1), and voila! The precision sky-rocked! I am getting a precision of around 70-80%.
Since I am using a sigmoid transfer function, the neural network wll have a continuous output just as logistic regression (but on this case between -1 and 1), so I am assuming my approach is technically still regression and not classification. My question is... Which is better? For my specific problem where fitting did not give really good results but I had to convert this to a binary problem... Which should give better results? Classification or regression?
Should I try a different configuration of a neural net (with a different transfer function), should I try with support vector machine or any other classification algorithm? Or should I stick with regression but defining a threshold myself just as I would do with logistic regression?
I created a classifier using libsvm toolbox in Matlab. It is classifying all positive class data as negative class and vice versa. I got good result while doing cross validation but while testing some data I am finding that classifier is working in wrong way. I can't seem to figure out where the problem lies.
Can anybody please help me on this matter.
This was a "feature" of prior versions of libsvm when the first training example's (binary) label was -1. The easiest solution is to get the latest version (> 3.17).
See here for more details: http://www.csie.ntu.edu.tw/~cjlin/libsvm/faq.html#f430
Suppose you have 500 training instances. 250 will be positive and other negative. Now in the testing set, the instances which have same features as positive will get predicted as positive. But when you are supplying testing labels (you have to provide testing labels so that LIBSVM can calculate accuracy, they won't obviously be used in the preiction algorithm) to LIBSVM, you have provided exactly reverse labels (by mistake). So you have a feeling that your predicted labels have come out exactly opposite. Because even a random classifier will have a 50% accuracy for a binary classification problem.
I have trained Feed Forward NN using Matlab Neural Network Toolbox on a dataset containing speech features and accelerometer measurements. Targetset contains two target classes for dataset: 0 and 1. The training, validation and performance are all fine and I have generated code for this network.
Now I need to use this neural network in real-time to recognize pattern when occur and generate 0 or 1 when I test a new dataset against previously trained NN. But when I issue a command:
c = sim(net, j)
Where "j" is a new dataset[24x11]; instead 0 or 1 i get this as an output (I assume I get percent of correct classification but there is no classification result itself):
c =
Columns 1 through 9
0.6274 0.6248 0.9993 0.9991 0.9994 0.9999 0.9998 0.9934 0.9996
Columns 10 through 11
0.9966 0.9963
So is there any command or a way that I can actually see classification results? Any help highly appreciated! Thanks
I'm no matlab user, but from a logical point of view, you are missing an important point:
The input to a Neural Network is a single vector, you are passing a matrix. Thus matlab thinks that you want to classify a bunch of vectors (11 in your case). So the vector that you get is the output activation for every of these 11 vectors.
The output activation is a value between 0 and 1 (I guess you are using the sigmoid), so this is perfectly normal. Your job is to get a threshold that fits your data best. You can get this threshold with cross validation on your training/test data or by just choosing one (0.5?) and see if the results are "good" and modify if needed.
NNs normally convert their output to a value within (0,1) using for example the logistic function. It's not a percentage or probability, just a relative measure of certainty. In any case this means is that you have to manually use a threshold (such as 0.5) to discriminate the two classes. Which threshold is best is tough to find because you must select the optimum trade off between precision and recall.
I am new to using Matlab and am trying to follow the example in the Bioinformatics Toolbox documentation (SVM Classification with Cross Validation) to handle a classification problem.
However, I am not able to understand Step 9, which says:
Set up a function that takes an input z=[rbf_sigma,boxconstraint], and returns the cross-validation value of exp(z).
The reason to take exp(z) is twofold:
rbf_sigma and boxconstraint must be positive.
You should look at points spaced approximately exponentially apart.
This function handle computes the cross validation at parameters
exp([rbf_sigma,boxconstraint]):
minfn = #(z)crossval('mcr',cdata,grp,'Predfun', ...
#(xtrain,ytrain,xtest)crossfun(xtrain,ytrain,...
xtest,exp(z(1)),exp(z(2))),'partition',c);
What is the function that I should be implementing here? Is it exp or minfn? I will appreciate if you can give me the code for this section. Thanks.
I will like to know what does it mean when it says exp([rbf_sigma,boxconstraint])
rbf_sigma: The svm is using a gaussian kernel, the rbf_sigma set the standard deviation (~size) of the kernel. To understand how kernels work, the SVM is putting the kernel around every sample (so that you have a gaussian around every sample). Then the kernels are added up (sumed) for the samples of each category/type. At each point the type which sum is higher would be the "winner". For example if type A has a higher sum of these kernels at point X, then if you have a new datum to classify in point X, it will be classified as type A. (there are other configuration parameters that may change the actual threshold where a category is selected over another)
Fig. Analyze this figure from the webpage you gave us. You can see how by adding up the gaussian kernels on the red samples "sumA", and on the green samples "sumB"; it is logical that sumA>sumB in the center part of the figure. It is also logical that sumB>sumA in the outer part of the image.
boxconstraint: it is a cost/penalty over miss-classified data. During the training stage of the classifier, where you use the training data to adjust the SVM parameters, the training algorithm is using an error function to decide how to optimize the SVM parameters in an iterative fashion. The cost for a miss-classified sample is proportional to how far it is from the boundary where it would have been classified correctly. In the figure that I am attaching the boundary is the inner blue circumference.
Taking into account BGreene indications and from what I understand of the tutorial:
In the tutorial they advice to try values for rbf_sigma and boxconstraint that are exponentially apart. This means that you should compare values like {0.2, 2, 20, ...} (note that this is {2*10^(i-2), i=1,2,3,...}), and NOT like {0.2, 0.3, 0.4, 0.5} (which would be linearly apart). They advice this to try a wide range of values first. You can further optimize later FROM the first optimum that you obtained before.
The command "[searchmin fval] = fminsearch(minfn,randn(2,1),opts)" will give you back the optimum values for rbf_sigma and boxconstraint. Probably you have to use exp(z) because it affects how fminsearch increments the values of z(1) and z(2) during the search for the optimum value. I suppose that when you put exp(z(1)) in the definition of #minfn, then fminsearch will take 'exponentially' big steps.
In machine learning, always try to understand that there are three subsets in your data: training data, cross-validation data, and test data. The training set is used to optimize the parameters of the SVM classifier for EACH value of rbf_sigma and boxconstraint. Then the cross validation set is used to select the optimum value of the parameters rbf_sigma and boxconstraint. And finally the test data is used to obtain an idea of the performance of your classifier (the efficiency of the classifier is determined upon the test set).
So, if you start with 10000 samples you may divide the data for example as training(50%), cross-validation(25%), test(25%). So that you will sample randomly 5000 samples for the training set, then 2500 samples from the 5000 remaining samples for the cross-validation set, and the rest of samples (that is 2500) would be separated for the test set.
I hope that I could clarify your doubts. By the way, if you are interested in the optimization of the parameters of classifiers and machine learning algorithms I strongly suggest that you follow this free course -> www.ml-class.org (it is awesome, really).
You need to implement a function called crossfun (see example).
The function handle minfn is passed to fminsearch to be minimized.
exp([rbf_sigma,boxconstraint]) is the quantity being optimized to minimize classification error.
There are a number of functions nested within this function handle:
- crossval is producing the classification error based on cross validation using partition c
- crossfun - classifies data using an SVM
- fminsearch - optimizes SVM hyperparameters to minimize classification error
Hope this helps