Matlab libsvm svmpredict accuracy verbose - matlab

I have a question of an annoying fact. I'm using libsvm with matlab and I'am able to predict using:
predicted_label = svmpredict(Ylabel, Xlabel, model);
but it happen that every time I make a predictions appears this:
Accuracy = X% (y/n) (classification)
Which I find annoying because I am repeating this procedure a lot of times and also makes it slow because its displaying in screen.
I think what I want is to avoid that svmpredict being verbose.
Can anyone help me with this? Thanks in advance.
-Jessica

I found a much better approach than editing the source code of the c library was to use matlabs evalc which places any output to the first output argument.
[~ predicted_label] = evalc('svmpredict(Ylabel, Xlabel, model)');
Because the string to be evaluated is fixed should be no performance decrease.

svmpredict(Ylabel, Xlabel, model, '-q');
From the manual:
Usage: [predicted_label, accuracy, decision_values/prob_estimates] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options')
[predicted_label] = svmpredict(testing_label_vector, testing_instance_matrix, model, 'libsvm_options')
Parameters:
model: SVM model structure from svmtrain.
libsvm_options:
-b probability_estimates: whether to predict probability estimates, 0 or 1 (default 0); one-class SVM not supported yet
-q : quiet mode (no outputs)

If you are using matlab, just find the line of code that is displaying this information (usually using 'disp', 'sprintf', or 'fprintf') and comment it out using the commenting operator %.
example:
disp(['Accuracy= ' num2str(x)]);
change it to:
% disp(['Accuracy= ' num2str(x)]);
If you are using the main libsvm library then you need to modify it before making.
1- Open the file 'svmpredict.c'
2- find this line of code:
info("Accuracy = %g%% (%d/%d) (classification)\n",
(double)correct/total*100,correct,total);
3- just comment it out using // operator
4- save and close the file
5- make the project

Related

How can i check how the predict function in MATLAB is working?

I have a very simple MATLAB program for training and testing a regression tree, I use the same Data carsmall that is in the tutorial examples:
clear all
clc
close all
load carsmall
X = [Cylinders, Weight, Horsepower, Displacement];
Y = MPG;
tree = fitrtree(X,Y, 'PredictorNames',{'Cylinders', 'Weight', 'Horsepower', 'Displacement'},'ResponseName','MPG','MinLeaf',10);
Xtest=[6,4100,150,130];
MPGest = predict(tree, Xtest);
This gives as a result MPGest=14.9167
I want to know how the predict function is arriving at that value, usually to understand I go line by line inside the function. This one is very tricky because uses classes so I arrive at this line
node = findNode(this.Impl,X,this.DataSummary.CategoricalPredictors,subtrees);
and inside that function I arrive to
n = classreg.learning.treeutils.findNode(X,...
subtrees,this.PruneList,...
this.Children',iscat,...
this.CutVar,this.CutPoint,this.CutCategories,...
this.SurrCutFlip,this.SurrCutVar,...
this.SurrCutPoint,this.SurrCutCategories,...
verbose);
when I try to step in at this step it just give me n=10, how is MATLAB arriving at this number? for example, If I wanted to make my own program to calculate this number using the tree object as input without using predict?
Actually, the function you are looking for is defined into a MEX file. If you try to open it using the open function, you will get the following outcome:
open 'classreg.learning.treeutils.findNode'
Error using open (line 145) Cannot edit the MEX-file
'C:...\toolbox\stats\classreg+classreg+learning+treeutils\findNode.mexw64'
Unfortunately, MEX files are compiled from C++ sources into bytecode (better known as assembly language). There are many decompilers out there that you can use in order to rebuild the instructions compiled into the library (this is a nice starting point, if you want a fast overview), and the whole process is feasible especially because the file itself is quite small.
The code you will get back will not be the original source code, but something similar: variables will have a default and meaningless name, there will be pointers all over and it may also contain bugs due to a wrong reversing of some assembly instructions. This will be probably enough to let you understand what's going on, but you will not be able to follow the computations through a step by step debugging session in Matlab.
Now, the only question you have to answer to is: is this worth the effort?

Matlab: How can I store the output of "fitcecoc" in a database

In Matlab help section, there's a very helpful example to solve classification problems under "Digit Classification Using HOG Features". You can easily execute the full script by clikcing on 'Open this example'. However, I'm wondering if there's a way to store the output of "fitcecoc" in a database so you don't have to keep training and classifying each and everytime you run the code. Here is the section of the code that's relevant to my question:
% fitcecoc uses SVM learners and a 'One-vs-One' encoding scheme.
classifier = fitcecoc(trainingFeatures, trainingLabels);
So, all I want to do is store 'classifier' in a database and retrieve it for the following code:
predictedLabels = predict(classifier, testFeatures);
Look at Database Toolbox in Matlab.
You could just save the classifier variable in a file:
save('classifier.mat','classifier')
And then load it before executing predict:
load('classifier.mat')
predictedLabels = predict(classifier, testFeatures);

Performing additional validation in LIBSVM matlab

I am working on MATLAB LIBSVM for a while to do prediction. I have a dataset out of which I use 75% for training, 15% for finding best parameters and remaining for testing. The code is given below.
trainX and trainY are the input and output training instances
testValX and testValY are the validation dataset I use
for j = 1:100
for jj = 1:10
model(j,jj) = svmtrain(trainY,trainX,...
['-s 3 -t 2 -c ' num2str(j) ' -p 0.001 -g ' num2str(jj) '-v 5']);
[predicted_label, ~, ~]=svmpredict(testValY,...
testValX,model(j,jj));
MSE(j,jj) = sum(((predicted_label-testValY).^2)/2);
end
end
[min_val,min_indi] = min(MSE(:));
best_predicted_model_rbf(i) = model(min_indi);
My question here is whether this is correct. I am creating model matrix with different values of c and g. I use -v option which is a key here. From the predicted models I use validation dataset for prediction and there by compute mean square error. Using this MSE I pick the best c and g. Since I am using -v which returns the cross validated output, is the procedure I follow correct?
First, I think there is a slight problem with the code shown, which is that num2str(jj) '-v 5']); doesn't have a space before the -v. That may cause that flag to not be read. In the other question, you stated that this 'sometimes returns a model', which is what would happen if that flag was not read. If the flag is read, you should only get a number, not a model, when the '-v' flag is used.
Second, it looks like you are doing two different things here, either one of which would be reasonable on its own. Calling svmtrain with '-v' runs cross validation on the training set. That shouldn't return a model, it should just return an mse estimate. You could use these estimates to determine which parameter setting was best, and then train one model with that setting on all of the training data.
Anyway, next you call svmpredict(y,x,model) on a hold-out validation set, testValX, but having called svmtrain with '-v', model should just be a scalar at this point. In order for this call to run correctly, you have to get the model from svmtrain without '-v', so that it is a struct. The rest of what you are doing makes sense for this case, in which you are doing hold-out validation using testValX.

LIBSVM in MATLAB/Octave - what's the output of libsvmread?

The second output of the libsvmread command is a set of features for each given training example.
For example, in the following MATLAB command:
[heart_scale_label, heart_scale_inst] = libsvmread('../heart_scale');
This second variable (heart_scale_inst) holds content in a form that I don't understand, for example:
<1, 1> -> 0.70833
What is the meaning of it? How is it to be used (I can't plot it, the way it is)?
PS. If anyone could please recommend a good LIBSVM tutorial, I'd appreciate it. I haven't found anything useful and the README file isn't very clear... Thanks.
The definitive tutorial for LIBSVM for beginners is called: A Practical Guide to SVM Classification it is available from the site of the authors of LIBSVM.
The second parameter returned is called the instance matrix. It is a matrix, let call it M, M(1,:) are the features of data point 1 and so on. The matrix is sparse that is why it prints out weirdly. If you want to see it fully print full(M).
[heart_scale_label, heart_scale_inst] = libsvmread('../heart_scale');
with heart_scale_label and heart_scale_inst you should be able to train an SVM by issuing:
mod = svmtrain(heart_scale_label,heart_scale_inst,'-c 1 -t 0');
I strong suggest you read the above linked guide to learn how to set the c parameter (and possibly, in case of RBF kernel the gamma parameter), but the above line is how you would train with that data.
I think it is the probability with which test case has been predicted to heart_scale label category

Matlab Weka Interface AdaBoost Issues: Out of Bounds Exception

I'm doing some cross-validation using a Matlab Weka Interface that I got from file exchange. My loop structure seems to work fine for Weka's Logistic classifier. However, when I try to do the exact same thing for AdaBoostM1, it throws the following error:
??? Java exception occurred: java.lang.ArrayIndexOutOfBoundsException
Error in ==> wekaClassify at 24 classProbs(t+1,:) = (classifier.distributionForInstance(testData.instance(t)))';
Error in ==> classifier_search at 225 [pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I have determined through some testing that this only occurs when the number of instances in the training set is greater than the number of instances in the test set. I am sure you can see why that is a problem for me, since in most situations the training set is greater than the test set in size.
Is there something different about how I should format my inputs when using Adaboost rather than Logistic? Any information you can give regarding this problem would be so helpful.
I downloaded this code from this page: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface
Emails bounce from the account of the guy who made it, and he doesn't seem to respond to comments on the page - I'm hoping that maybe someone here has used this.
EDIT: Here is the code that I use to train and test the classifier:
classifier = trainWekaClassifier(matlab2weka('training', featurelabels, train), 'meta.AdaBoostM1', { strcat('-P 100 -S 1 -I ', num2str(r), '-W weka.classifiers.trees.DecisionStump')});
[pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I haven't used this combination of software, so I can only take a guess at what could cause this.
Are your training/testing data matrices the right way round? They should be N-by-D (N instances, D features).
If you were passing in a D-by-N training matrix and a D-by-M testing matrix, then I would expect it to work only when M < N - which is what you describe - and even then, it wouldn't give a meaningful result.