quadprog in MATLAB taking lot of time - matlab

My goal is to classify an image using multi class linear SVM (with out kernel). I would like to write my own SVM classifier
I am using MATLAB and have trained linear SVM using image sets provided.
I have around 20 classes, 5 images in each class (total of 100 images) and I am using one-versus-all strategy.
Each image is a (112,92) matrix. That means 112*92=10304 values.
I am using quadprog(H,f,A,C) to solve the quadratic equation (y=w'x+b) in the SVM. One call to quadprog returns w vector of size 10304 for one image. That means I have to call quadprog for 100 times.
The problem is one quadprog call takes 35 secs to get executed. That means for 100 images it will take 3500 secs. This might be due to large size of vectors and matrices involved.
I want to reduce the execution time of quadprog. Is there any way to do it?

First of all, when you do classification using SVM, you usually extract a feature (like HOG) of an image, so that the dimensionality of space on which SVM has to operate gets reduced. You are using raw pixel values, which generates a 10304-D vector. That is not good. Use some standard feature.
Secondly, you do not call quadprog 100 times. You call only once. The concept behind the optimization is, you want to find a weight vector w and a bias b such that it satisfies w'x_i+b=y_i for all images (i.e. all x_i). Note that i will go from 1 to the number of examples in your training set, but w and b stay the same. If you wanted a (w,b) that will satisfy only one x, you do not need any fancy optimization. So stack your x in a big matrix of dimension NxD, y will be a vector of Nx1, and then call quadprog. It will take a longer time than 35 seconds, but you do it only once. This is called training an SVM. While testing for a new image, you just extract its feature, and do w*x+b to get its class.
Third, unless you are doing this as an exercise to understand SVMs, quadprog is not the best way to solve this problem. You need to use some other techniques which work well with large data, for example, Sequential Minimal Optimization.
How can one linear hyperplane classify more than 2 classes: For multi-class classification, SVMs usually use two popular approaches:
One-vs-one SVM: For a C class problem, you train C(C-1)/2 classifiers and at test time, you test that many classifiers and choose the class which receives most votes.
One-vs-all SVM: As name suggests, you train one classifier per class with positive samples from that class and negative samples from all other classes.
From LIBSVM FAQs:
It is one-against-one. We chose it after doing the following comparison: C.-W. Hsu and C.-J. Lin. A comparison of methods for multi-class support vector machines , IEEE Transactions on Neural Networks, 13(2002), 415-425.
"1-against-the rest" is a good method whose performance is comparable to "1-against-1." We do the latter simply because its training time is shorter.
However, also note that a naive implementation of one-vs-one may not be practical for large-scale problems. LIBSVM website also lists this shortcoming and provides an extension.
LIBLINEAR does not support one-versus-one multi-classification, so we provide an extension here. If k is the number of classes, we generate k(k-1)/2 models, each of which involves only two classes of training data. According to Yuan et al. (2012), one-versus-one is not practical for large-scale linear classification because of the huge space needed to store k(k-1)/2 models. However, this approach may still be viable if model vectors (i.e., weight vectors) are very sparse. Our implementation stores models in a sparse form and can effectively handle some large-scale data.

Related

Poor performance for SVM for unbalanced dataset- how to improve?

Consider a dataset A which has examples for training in a binary classification problem. I have used SVM and applied the weighted method (in MATLAB) since the dataset is highly imbalanced. I have applied weights as inversely proportional to the frequency of data in each class. This is done on training using the command
fitcsvm(trainA, trainTarg , ...
'KernelFunction', 'RBF', 'KernelScale', 'auto', ...
'BoxConstraint', C,'Weight',weightTrain );
I have used 10 folds cross-validation for training and learned the hyperparameter as well. so, inside CV the dataset A is split into train (trainA) and validation sets (valA). After training is over and outside the CV loop, I get the confusion matrix on A:
80025 1
0 140
where the first row is for the majority class and the second row is for the minority class. There is only 1 false positive (FP) and all minority class examples have been correctly classified giving true positive (TP) = 140.
PROBLEM: Then, I run the trained model on a new unseen test data set B which was never seen during training. This is the confusion matrix for testing on B .
50075 0
100 0
As can be seen, the minority class has not been classified at all, hence the purpose of weights has failed. Although, there is no FP the SVM fails to capture the minority class examples.
I have not applied any weights or balancing method such as sampling (SMOTE, RUSBoost etc) on B. What could be wrong and how to overcome this problem?
Class misclassification weights could be set instead of sample weights!
You can set the class weights based on the following example.
Mis-classification weight for class A(n-records; dominant) into class B (m-records; minority class) can be n/m.
Mis-classification weight For class B as class A can be set as 1 or m/n based on the severity, which you want to impose on the learning
c=[0 2.2;1 0];
mod=fitcsvm(X,Y,'Cost',c)
According to documentation:
For two-class learning, if you specify a cost matrix, then the
software updates the prior probabilities by incorporating the
penalties described in the cost matrix. Consequently, the cost matrix
resets to the default. For more details on the relationships and
algorithmic behavior of BoxConstraint, Cost, Prior, Standardize, and
Weights, see Algorithms.
Area Under Curve (AUC) is usually used to measure performance of models that applied on unbalanced data. It is also good to plot ROC curve to visually get more insights. Using only confusion matrix for such models may lead to misinterpretation.
perfcurve from the Statistics and Machine Learning Toolbox provides both functionalities.

One Class SVM using LibSVM in Matlab - Conceptual

Perhaps this is an easy question, but I want to make sure I understand the conceptual basis of the LibSVM implementation of one-class SVMs and if what I am doing is permissible.
I am using one class SVMs in this case for outlier detection and removal. This is used in the context of a greater time series prediction model as a data preprocessing step. That said, I have a Y vector (which is the quantity we are trying to predict and is continuous, not class labels) and an X matrix (continuous features used to predict). Since I want to detect outliers in the data early in the preprocessing step, I have yet to normalize or lag the X matrix for use in prediction, or for that matter detrend/remove noise/or otherwise process the Y vector (which is already scaled to within [-1,1]). My main question is whether it is correct to model the one class SVM like so (using libSVM):
svmod = svmtrain(ones(size(Y,1),1),Y,'-s 2 -t 2 -g 0.00001 -n 0.01');
[od,~,~] = svmpredict(ones(size(Y,1),1),Y,svmod);
The resulting model does yield performance somewhat in line with what I would expect (99% or so prediction accuracy, meaning 1% of the observations are outliers). But why I ask is because in other questions regarding one class SVMs, people appear to be using their X matrices where I use Y. Thanks for your help.
What you are doing here is nothing more than a fancy range check. If you are not willing to use X to find outliers in Y (even though you really should), it would be a lot simpler and better to just check the distribution of Y to find outliers instead of this improvised SVM solution (for example remove the upper and lower 0.5-percentiles from Y).
In reality, this is probably not even close to what you really want to do. With this setup you are rejecting Y values as outliers without considering any context (e.g. X). Why are you using RBF and how did you come up with that specific value for gamma? A kernel is total overkill for one-dimensional data.
Secondly, you are training and testing on the same data (Y). A kitten dies every time this happens. One-class SVM attempts to build a model which recognizes the training data, it should not be used on the same data it was built with. Please, think of the kittens.
Additionally, note that the nu parameter of one-class SVM controls the amount of outliers the classifier will accept. This is explained in the LIBSVM implementation document (page 4): It is proved that nu is an upper bound on the fraction of training errors and
a lower bound of the fraction of support vectors. In other words: your training options specifically state that up to 1% of the data can be rejected. For one-class SVM, replace can by should.
So when you say that the resulting model does yield performance somewhat in line with what I would expect ... ofcourse it does, by definition. Since you have set nu=0.01, 1% of the data is rejected by the model and thus flagged as an outlier.

measuring uncertainty in matlabs svmclassify

I'm doing contextual object recognition and I need a prior for my observations. e.g. this space was labeled "dog", what's the probability that it was labeled correctly? Do you know if matlabs svmclassify has an argument to return this level of certainty with it's classification?
If not, matlabs svm has the following structures in it:
SVM =
SupportVectors: [11x124 single]
Alpha: [11x1 double]
Bias: 0.0915
KernelFunction: #linear_kernel
KernelFunctionArgs: {}
GroupNames: {11x1 cell}
SupportVectorIndices: [11x1 double]
ScaleData: [1x1 struct]
FigureHandles: []
Can you think of any ways to compute a good measure of uncertainty from these? (Which support vector to use?) Papers/articles explaining uncertainty in SVMs welcome. More in depth explanations of matlabs SVM are also welcome.
If you can't do it this way, can you think of any other libraries with SVMs that have this measure of uncertainty?
Salam,
Hint: I would suggest that you modify the svmclassify.m function to pass the f values of the svmdecision.m to the user. This is in addition to the outputclass.
To access svmclassify.m, just type the following in the Matlab command line:
open svmclassify
I found that svmdecision.m (https://code.google.com/p/auc-recognition/source/browse/trunk/ALLMatlab/AucLib/DigitsOfflineNew/svmdecision.m?spec=svn260&r=260) can pass the f value; so the following may be substitude when calling the svmdecision.m in svmclassify.m:
[classified, f] = svmdecision(sample,svmStruct);
By further passing this f value to the user, the multi-class implementations such as one-vs-all may be designed with the binary classifier already built in Matlab.
f values are what you are looking for in terms of comparison of different inputs being related to their output class.
I hope this helps you to write your code and understand it! though you've probably solved your problem by now.
I know this was posted a long time ago, but I think the author is looking for a measure of uncertainty in the output of the trained SVM, whether the output is an estimated label or an estimated probability, which are both point estimates. One measure would be the variance of the output for the same input x. The problem is that regular SVMs are not stochastic models, such as probabilistic/Bayesian neural nets for example. The inference with an SVM is always the same, given the same x.
I would need to check, but perhaps it would be possible to train an SVM with stochastic regularization. Maybe some form of stochastic margin maximization, whereby during the steps of the optimization routine, input training vectors can undergo small stochastic perturbations or perhaps randomly chosen features could be dropped out. Similarly, during testing, one could drop or modify different randomly chosen features, producing different point estimates each time. Then, you could take the variance of the estimates, producing the measure of uncertainty.
It could be argued that, if the input x presents patterns that the model is not familiar with, the point estimates will be wildly unstable and variable.
One simple illustration can be the following: Imagine a 2d toy example, where the two classes are well separated and occupy a dense range of feature values. While a model trained on this set will generalize well to points that fall into the distribution/range of the values seen in the train data, it would be quite unstable given a test observation that sits far away from both classes, but at the same latitude, so to speak, as the separating hyperplane. Such an observations presents a challenge to the model since a tiny perturbation of one of the support vectors could cause a tiny rotation in the separating hyperplane, not significantly changing training or validation error, but potentially changing the estimate of the far-away test observation by a lot.
Are you supplying the data and doing the training yourself? If so, the best thing to do is to partition your data into training and test sets. Matlab has a function for this called cvpartition. You can use the classification results on the test data to estimate the rate of false positive and the miss rate. For a binary classification task, those two numbers will quantify the uncertainty. For a classification test with multiple hypotheses the best thing to do, probably, is to compile your results in a confusion matrix.
edit. found some older code I'd used that might help a little
P=cvpartition(Y,'holdout',0.20);
rbfsigma=1.41;
svmStruct=svmtrain(X(P.training,:),Y(P.training),'kernel_function','rbf','rbf_sigma',rbfsigma,'boxconstraint',0.7,'showplot','true');
C=svmclassify(svmStruct,X(P.test,:));
errRate=sum(Y(P.test)~=C)/P.TestSize
conMat=confusionmat(Y(P.test),C)
LIBSVM, which also has a Matlab interface, has an option -b that makes the classification function return probability estimates. They seem to be computed following the general approach of Platt (2000), which is to perform a one-dimensional logistic regression applied to the decision values.
Platt, J. C. (2000). Probabilities for SV machines. In Smola, A. J. et al. (eds.) Advances in large margin classifiers. pp. 61–74. Cambridge: The MIT Press.

Good results with NN, not with SVM; cause for concern?

I have painstakingly gathered data for a proof-of-concept study I am performing. The data consists of 40 different subjects, each with 12 parameters measured at 60 time intervals and 1 output parameter being 0 or 1. So I am building a binary classifier.
I knew beforehand that there is a non-linear relation between the input-parameters and the output so a simple perceptron of Bayes classifier would be unable to classify the sample. This assumption proved correct after initial tests.
Therefore I went to neural networks and as I hoped the results were pretty good. An error of about 1-5% is generally the result. The training is done by using 70% as training and 30% as evaluation. Running the complete dataset again (100%) through the model I was very happy with the results. The following is a typical confusion matrix (P = positive, N = negative):
P N
P 13 2
N 3 42
So I am happy and with the notion that I used a 30% for evaluation I am confident that I am not fitting noise.
Therefore I resolved to SVM for a double check and the SVM was unable to converge to a good solution. Most of the time the solutions are terrible (say 90% error...). Maybe I am not fully aware of SVM's or the implementations are not correct, but it troubles me because I thought that when NN provide a good solution, SVM's are most of the time better in seperating the data due to their maximum-margin hyperplane.
What does this say of my result? Am I fitting noise? And how do I know if this is a correct result?
I am using Encog for the calculations but the NN results are comparable to home-grown NN models I made.
If it is your first time to use SVM, I strongly recommend you to take a look at A Practical Guide to Support Vector Classication, by authors of a famous SVM package libsvm. It gives a list of suggestions to train your SVM classifier.
Transform data to the format of an SVM package
Conduct simple scaling on the data
Consider the RBF kernel
Use cross-validation to nd the best parameter C and γ
Use the best parameter C and γ
to train the whole training set
Test
In short, try scaling your data and carefully choosing the kernal plus the parameters.

SVM Classification with Cross Validation

I am new to using Matlab and am trying to follow the example in the Bioinformatics Toolbox documentation (SVM Classification with Cross Validation) to handle a classification problem.
However, I am not able to understand Step 9, which says:
Set up a function that takes an input z=[rbf_sigma,boxconstraint], and returns the cross-validation value of exp(z).
The reason to take exp(z) is twofold:
rbf_sigma and boxconstraint must be positive.
You should look at points spaced approximately exponentially apart.
This function handle computes the cross validation at parameters
exp([rbf_sigma,boxconstraint]):
minfn = #(z)crossval('mcr',cdata,grp,'Predfun', ...
#(xtrain,ytrain,xtest)crossfun(xtrain,ytrain,...
xtest,exp(z(1)),exp(z(2))),'partition',c);
What is the function that I should be implementing here? Is it exp or minfn? I will appreciate if you can give me the code for this section. Thanks.
I will like to know what does it mean when it says exp([rbf_sigma,boxconstraint])
rbf_sigma: The svm is using a gaussian kernel, the rbf_sigma set the standard deviation (~size) of the kernel. To understand how kernels work, the SVM is putting the kernel around every sample (so that you have a gaussian around every sample). Then the kernels are added up (sumed) for the samples of each category/type. At each point the type which sum is higher would be the "winner". For example if type A has a higher sum of these kernels at point X, then if you have a new datum to classify in point X, it will be classified as type A. (there are other configuration parameters that may change the actual threshold where a category is selected over another)
Fig. Analyze this figure from the webpage you gave us. You can see how by adding up the gaussian kernels on the red samples "sumA", and on the green samples "sumB"; it is logical that sumA>sumB in the center part of the figure. It is also logical that sumB>sumA in the outer part of the image.
boxconstraint: it is a cost/penalty over miss-classified data. During the training stage of the classifier, where you use the training data to adjust the SVM parameters, the training algorithm is using an error function to decide how to optimize the SVM parameters in an iterative fashion. The cost for a miss-classified sample is proportional to how far it is from the boundary where it would have been classified correctly. In the figure that I am attaching the boundary is the inner blue circumference.
Taking into account BGreene indications and from what I understand of the tutorial:
In the tutorial they advice to try values for rbf_sigma and boxconstraint that are exponentially apart. This means that you should compare values like {0.2, 2, 20, ...} (note that this is {2*10^(i-2), i=1,2,3,...}), and NOT like {0.2, 0.3, 0.4, 0.5} (which would be linearly apart). They advice this to try a wide range of values first. You can further optimize later FROM the first optimum that you obtained before.
The command "[searchmin fval] = fminsearch(minfn,randn(2,1),opts)" will give you back the optimum values for rbf_sigma and boxconstraint. Probably you have to use exp(z) because it affects how fminsearch increments the values of z(1) and z(2) during the search for the optimum value. I suppose that when you put exp(z(1)) in the definition of #minfn, then fminsearch will take 'exponentially' big steps.
In machine learning, always try to understand that there are three subsets in your data: training data, cross-validation data, and test data. The training set is used to optimize the parameters of the SVM classifier for EACH value of rbf_sigma and boxconstraint. Then the cross validation set is used to select the optimum value of the parameters rbf_sigma and boxconstraint. And finally the test data is used to obtain an idea of the performance of your classifier (the efficiency of the classifier is determined upon the test set).
So, if you start with 10000 samples you may divide the data for example as training(50%), cross-validation(25%), test(25%). So that you will sample randomly 5000 samples for the training set, then 2500 samples from the 5000 remaining samples for the cross-validation set, and the rest of samples (that is 2500) would be separated for the test set.
I hope that I could clarify your doubts. By the way, if you are interested in the optimization of the parameters of classifiers and machine learning algorithms I strongly suggest that you follow this free course -> www.ml-class.org (it is awesome, really).
You need to implement a function called crossfun (see example).
The function handle minfn is passed to fminsearch to be minimized.
exp([rbf_sigma,boxconstraint]) is the quantity being optimized to minimize classification error.
There are a number of functions nested within this function handle:
- crossval is producing the classification error based on cross validation using partition c
- crossfun - classifies data using an SVM
- fminsearch - optimizes SVM hyperparameters to minimize classification error
Hope this helps