How to fix the fisheriris cross classification - matlab

I tried to run this code found online, but it does not work. The error is
Error using svmclassify (line 53)
The first input should be a `struct` generated by `SVMTRAIN`.
Error in fisheriris_classification (line 27)
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
Can anyone help me fix this problem? Thank you so much!
clear all;
close all;
load fisheriris %# load iris dataset
groups = ismember(species,'setosa'); %# create a two-class problem
%# number of cross-validation folds:
%# If you have 50 samples, divide them into 10 groups of 5 samples each,
%# then train with 9 groups (45 samples) and test with 1 group (5 samples).
%# This is repeated ten times, with each group used exactly once as a test set.
%# Finally the 10 results from the folds are averaged to produce a single
%# performance estimation.
k=10;
cvFolds = crossvalind('Kfold', groups, k); %# get indices of 10-fold CV
cp = classperf(groups); %# init performance tracker
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train an SVM model over training instances
svmModel = svmtrain(meas(trainIdx,:), groups(trainIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
%# test using test instances
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
%# get accuracy
cp.CorrectRate
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
%with the output:
%ans =
% 0.99333
%ans =
% 100 1
% 0 49
% 0 0

The reason for the issue seems to me the way MATLAB finds functions on the search path. I am fairly certain that it is still attempting to use the LIBSVM function rather than the built-in MATLAB function. Here is more information about the search path:
http://www.mathworks.com/help/matlab/matlab_env/what-is-the-matlab-search-path.html
To verify whether this is the issue, please try the following command in the command window:
>> which -all svmtrain
You should find that the built-in function is being shadowed by the LIBSVM function. You can either remove LIBSVM from the MATLAB search path using the "Set Path" tool in the Toolstrip, or run your code from a different directory that does not contain the LIBSVM files. I would recommend the first option. To read more about the built-in MATLAB functions, check these links:
http://www.mathworks.com/help/stats/svmtrain.html
http://www.mathworks.com/help/stats/svmclassify.html
If you would like to continue use LIBSVM, I would recommend checking the following site out.
https://www.csie.ntu.edu.tw/~cjlin/index.html
Hope this helps.

Related

Example of 10-fold cross-validation with Neural network classification in MATLAB

I am looking for an example of applying 10-fold cross-validation in neural network.I need something link answer of this question: Example of 10-fold SVM classification in MATLAB
I would like to classify all 3 classes while in the example only two classes were considered.
Edit: here is the code I wrote for iris example
load fisheriris %# load iris dataset
k=10;
cvFolds = crossvalind('Kfold', species, k); %# get indices of 10-fold CV
net = feedforwardnet(10);
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train
net = train(net,meas(trainIdx,:)',species(trainIdx)');
%# test
outputs = net(meas(trainIdx,:)');
errors = gsubtract(species(trainIdx)',outputs);
performance = perform(net,species(trainIdx)',outputs)
figure, plotconfusion(species(trainIdx)',outputs)
end
error given by matlab:
Error using nntraining.setup>setupPerWorker (line 62)
Targets T{1,1} is not numeric or logical.
Error in nntraining.setup (line 43)
[net,data,tr,err] = setupPerWorker(net,trainFcn,X,Xi,Ai,T,EW,enableConfigure);
Error in network/train (line 335)
[net,data,tr,err] = nntraining.setup(net,net.trainFcn,X,Xi,Ai,T,EW,enableConfigure,isComposite);
Error in Untitled (line 17)
net = train(net,meas(trainIdx,:)',species(trainIdx)');
It's a lot simpler to just use MATLAB's crossval function than to do it manually using crossvalind. Since you are just asking how to get the test "score" from cross-validation, as opposed to using it to choose an optimal parameter like for example the number of hidden nodes, your code will be as simple as this:
load fisheriris;
% // Split up species into 3 binary dummy variables
S = unique(species);
O = [];
for s = 1:numel(S)
O(:,end+1) = strcmp(species, S{s});
end
% // Crossvalidation
vals = crossval(#(XTRAIN, YTRAIN, XTEST, YTEST)fun(XTRAIN, YTRAIN, XTEST, YTEST), meas, O);
All that remains is to write that function fun which takes in input and output training and test sets (all provided to it by the crossval function so you don't need to worry about splitting your data yourself), trains a neural net on the training set, tests it on the test set and then output a score using your preferred metric. So something like this:
function testval = fun(XTRAIN, YTRAIN, XTEST, YTEST)
net = feedforwardnet(10);
net = train(net, XTRAIN', YTRAIN');
yNet = net(XTEST');
%'// find which output (of the three dummy variables) has the highest probability
[~,classNet] = max(yNet',[],2);
%// convert YTEST into a format that can be compared with classNet
[~,classTest] = find(YTEST);
%'// Check the success of the classifier
cp = classperf(classTest, classNet);
testval = cp.CorrectRate; %// replace this with your preferred metric
end
I don't have the neural network toolbox so I am unable to test this I'm afraid. But it should demonstrate the principle.

Cross Validation Using libsvm

I am currently performing 5 - fold cross validation where I am using this code :
%# read some training data
[labels,data] = libsvmread('Training_Data_libsvmFormat.txt');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3) %Coarse Grid Search: bestC = 8 bestGamma = 2
%[C,gamma] = meshgrid(1:0.5:4, -1:0.25:3) %Fine Grid Search: bestC = 4 bestGamma = 2
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), 'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
Now, I know that in 5 fold cross validation, 4/5 of dataset are used for training and 1/5 for testing and all the time changing the testing part to obtain the best cross C and gamma for RBF. However, in the dataset the 1st 1000 examples are positive while the last 3000 are all negative. Does cross validation using svmtrain() shuffle the data or it may be the case that the 1/5 for testing contains all negative examples please? I am asking this question as if it does not shuffle the data, the accuracy is not realistic.
I appreciate you assistance.

Cross-Validation with libsvm to find best parameters

In order to find the best parameters to be used with libsvm I used the code below. Instead of './heart_scale' I had a file containing positive and negative examples each with a hog vector in libsvm format. I had 1000 positive examples and 4000 negative. However these were put in order, i.e. the 1st 1000 examples were positive examples and the others were negative.
Question: Now, I came in doubt whether the accuracy returned by this code is actual accuracy. This is because when I read on 5 fold cross-validation, it takes the first 4/5 of the data as training and the 1/5 left for testing. Does this mean that it can be the case the testing set is all negative? Or it takes the examples randomly please?
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
You can find answer to your question in the LIBSVM source code.
See the function svm_cross_validation in the svm.cpp.
As you can see, for classification cross-validation problem LIBSVM firstly performs class grouping and than shuffling.
So, answer to your question: yes, the accuracy returned by this code is actual accuracy.
Note: the accuracy estimation depends also on data nature, cross-validation folds number and itself is a random value with some distribution.

How to get the error vs. epochs (iterations) plot in matlab when using svm classification?

I use svmtrain to train my data set and svmclassify to predict test set. I want to look at the optimization process, the error vs. epochs (iterations) plot. I look into the usage and the code and find out that there are no information regarding such problem. The only thing I can get is control of the Maximum Iteration.
How to get the error vs. epochs (iterations) plot in matlab when using SVM classification?
Here is the code I modified. But not the one I want, I want the error at each epoch. Anybody did such analysis before? Thank you.
Best regards!
%# load dataset
load fisheriris %# load iris dataset
Groups = ismember(species,'setosa'); %# create a two-class problem
MaxIterValue = 210; %# maximum iterations
ErrVsIter = zeros(MaxIterValue, 2); %# store error data
%# Control maximum iterations
for N = 200: MaxIterValue
% options.MaxIter = N;
option = statset('MaxIter', N);
%# 5-fold Cross-validation
k = 5;
cvFolds = crossvalind('Kfold', Groups, k); %# get indices of 5-fold CV
cp = classperf(Groups); %# init performance tracker
for i = 1:k %# for each fold
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx; %# get indices training instances
%# train an SVM model over training instances
svmModel = svmtrain(meas(trainIdx,:), Groups(trainIdx), ...
'options',option, 'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'kernel_function','linear');
%#plotperform(svmModel);
%# test using test instances
pred = svmclassify(svmModel, meas(testIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
%# get error rate
ErrVsIter(N, 1) = N;
ErrVsIter(N, 2) = cp.ErrorRate;
end
plot(ErrVsIter(1:MaxIterValue,1),ErrVsIter(1:MaxIterValue,2));
You do it all correct, the problem is SVM is finding solution every time! So each epoch has CorrectRate=1, try and type cp.CorrectRate in your codes to see it.
The problem is in below line:
Groups = ismember(species,'setosa');
The data is so simple for SVM to solve.
and also plot it like this:
plot(ErrVsIter(200:MaxIterValue,1),ErrVsIter(200:MaxIterValue,2));

How can I get predicted values in SVM using MATLAB?

I am trying to get a prediction column matrix in MATLAB but I don't quite know how to go about coding it. My current code is -
load DataWorkspace.mat
groups = ismember(Num,'Yes');
k=10;
%# number of cross-validation folds:
%# If you have 50 samples, divide them into 10 groups of 5 samples each,
%# then train with 9 groups (45 samples) and test with 1 group (5 samples).
%# This is repeated ten times, with each group used exactly once as a test set.
%# Finally the 10 results from the folds are averaged to produce a single
%# performance estimation.
cvFolds = crossvalind('Kfold', groups, k);
cp = classperf(groups);
for i = 1:k
testIdx = (cvFolds == i);
trainIdx = ~testIdx;
svmModel = svmtrain(Data(trainIdx,:), groups(trainIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','SMO', ...
'Kernel_Function','rbf');
pred = svmclassify(svmModel, Data(testIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, testIdx);
end
cp.CorrectRate
cp.CountingMatrix
The issue is that it's actually calculating the accuracy 11 times in total - 10 times for each fold and one final time as an average. But if I take the individual predictions of each fold and print pred for each loop, the accuracy understandable reduces greatly.
However, I need a column matrix of the predicted values for each row of the data. Any ideas on how I can go about modifying the code?
The whole idea of cross-validation is get an unbiased estimate of the performance of a classifier.
Once that done, you usually just train a model over the entire data. This model will be used to predict future instances.
So just do:
svmModel = svmtrain(Data, groups, ...);
pred = svmclassify(svmModel, otherData, ...);