Selecting SVM parameters using cross validation and F1-scores

Selecting SVM parameters using cross validation and F1-scores - matlab

I need to keep track of the F1-scores while tuning C & Sigma in SVM,
For example the following code keeps track of the Accuracy, I need to change it to F1-Score but I was not able to do that…….
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
I have seen the following two links
Retraining after Cross Validation with libsvm
10 fold cross-validation in one-against-all SVM (using LibSVM)
I do understand that I have to first find the best C and gamma/sigma parameters over the training data, then use these two values to do a LEAVE-ONE-OUT crossvalidation classification experiment,
So what I want now is to first do a grid-search for tuning C & sigma.
Please I would prefer to use MATLAB-SVM and not LIBSVM.
Below is my code for LEAVE-ONE-OUT crossvalidation classification.
... clc
clear all
close all
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
input=good(:,1:12);
target=good(:,13);
CVO = cvpartition(target,'leaveout',1);
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:CVO.NumTestSets %# for each fold
trIdx = CVO.training(i);
teIdx = CVO.test(i);
%# train an SVM model over training instances
svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
%# test using test instances
pred = svmclassify(svmModel, input(teIdx,:), 'Showplot',false);
%# evaluate and update performance object
cp = classperf(cp, pred, teIdx);
end
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
aF1 = ((f1P+f1N)/2);
i have changed the code
but i making some mistakes and i am getting errors,
a = load('V1.csv');
X = double(a(:,1:12));
Y = double(a(:,13));
% train data
datall=[X,Y];
A=datall;
n = 40;
ordering = randperm(n);
B = A(ordering, :);
good=B;
inpt=good(:,1:12);
target=good(:,13);
k=10;
cvFolds = crossvalind('Kfold', target, k); %# get indices of 10-fold CV
cp = classperf(target); %# init performance tracker
svmModel=[];
for i = 1:k
testIdx = (cvFolds == i); %# get indices of test instances
trainIdx = ~testIdx;
C = 0.1:0.1:1;
S = 0.1:0.1:1;
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)
for s = 1:numel(S)
vals = crossval(#(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c))),inpt(trainIdx,:),target(trainIdx));
fscores(c,s) = mean(vals);
end
end
end
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
.......
and the function.....
.....
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);
pred = svmclassify(svmModel, XVAL, 'Showplot',false);
cp = classperf(YVAL, pred)
%# get accuracy
accuracy=cp.CorrectRate*100
sensitivity=cp.Sensitivity*100
specificity=cp.Specificity*100
PPV=cp.PositivePredictiveValue*100
NPV=cp.NegativePredictiveValue*100
%# get confusion matrix
%# columns:actual, rows:predicted, last-row: unclassified instances
cp.CountingMatrix
recallP = sensitivity;
recallN = specificity;
precisionP = PPV;
precisionN = NPV;
f1P = 2*((precisionP*recallP)/(precisionP + recallP));
f1N = 2*((precisionN*recallN)/(precisionN + recallN));
fscore = ((f1P+f1N)/2);
end

So basically you want to take this line of yours:
svmModel = svmtrain(input(trIdx,:), target(trIdx), ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint',0.1, 'Kernel_Function','rbf', 'RBF_Sigma',0.1);
put it in a loop that varies your 'BoxConstraint' and 'RBF_Sigma' parameters and then uses crossval to output the f1-score for that iterations combination of parameters.
You can use a single for-loop exactly like in your libsvm code example (i.e. using meshgrid and 1:numel(), this is probably faster) or a nested for-loop. I'll use a nested loop so that you have both approaches:
C = [0.001, 0.003, 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30, 100, 300] %// you must choose your own set of values for the parameters that you want to test. You can either do it this way by explicitly typing out a list
S = 0:0.1:1 %// or you can do it this way using the : operator
fscores = zeros(numel(C), numel(S)); %// Pre-allocation
for c = 1:numel(C)
for s = 1:numel(S)
vals = crossval(#(XTRAIN, YTRAIN, XVAL, YVAL)(fun(XTRAIN, YTRAIN, XVAL, YVAL, C(c), S(c)),input(trIdx,:),target(trIdx));
fscores(c,s) = mean(vals);
end
end
%// Then establish the C and S that gave you the bet f-score. Don't forget that c and s are just indexes though!
[cbest, sbest] = find(fscores == max(fscores(:)));
C_final = C(cbest);
S_final = S(sbest);
Now we just have to define the function fun. The docs have this to say about fun:
fun is a function handle to a function with two inputs, the training
subset of X, XTRAIN, and the test subset of X, XTEST, as follows:
testval = fun(XTRAIN,XTEST) Each time it is called, fun should use
XTRAIN to fit a model, then return some criterion testval computed on
XTEST using that fitted model.
So fun needs to:
output a single f-score
take as input a training and testing set for X and Y. Note that these are both subsets of your actual training set! Think of them more like a training and validation SUBSET of your training set. Also note that crossval will split these sets up for you!
Train a classifier on the training subset (using your current C and S parameters from your loop)
RUN your new classifier on the test (or validation rather) subset
Compute and output a performance metric (in your case you want the f1-score)
You'll notice that fun can't take any extra parameters which is why I've wrapped it in an anonymous function so that we can pass the current C and S values in. (i.e. all that #(...)(fun(...)) stuff above. That's just a trick to "convert" our six parameter fun into the 4 parameter one required by crossval.
function fscore = fun(XTRAIN, YTRAIN, XVAL, YVAL, C, S)
svmModel = svmtrain(XTRAIN, YTRAIN, ...
'Autoscale',true, 'Showplot',false, 'Method','ls', ...
'BoxConstraint', C, 'Kernel_Function','rbf', 'RBF_Sigma', S);
pred = svmclassify(svmModel, XVAL, 'Showplot',false);
CP = classperf(YVAL, pred)
fscore = ... %// You can do this bit the same way you did earlier
end

I found the only problem with target(trainIdx). It's a row vector so I just replaced target(trainIdx) with target(trainIdx) which is a column vector.

Related

Repeated classification accuracies in a loop always the same

I have pretty simple code for binary classification (see below). When I re-run this in Matlab (just by manually pressing the "run" button), each run gives me slightly different accuracies for each of the 14 subjects. However, if I loop over my code nrPermute times, every iteration of the loop gives me EXACTLY the same accuracy for the respective subject - why is that? So in the first code, the mean(accuracy) is different for different runs, whereas in the second code it is always the same for different iterations. Both codes below
Code where only one 10-fold crossvalidation is done for each subject:
%% SVM-Classification
nrFolds = 10; %number of folds of crossvalidation, 10 is standard
kernel = 'linear'; % 'linear', 'rbf' or 'polynomial'
C = 1;
solver = 'L1QP';
cvFolds = crossvalind('Kfold', labels, nrFolds);
for k = 1:14
for i = 1:nrFolds % iteratre through each fold
testIdx = (cvFolds == i); % indices of test instances
trainIdx = ~testIdx; % indices training instances
% train the SVM
cl = fitcsvm(features(trainIdx,:),
labels(trainIdx),'KernelFunction',kernel,'Standardize',true,...
'BoxConstraint',C,'ClassNames',[0,1],'Solver',solver);
[label,scores] = predict(cl, features(testIdx,:));
eq = sum(label==labels(testIdx));
accuracy(i) = eq/numel(labels(testIdx));
end
crossValAcc(k) = mean(accuracy);
end
Code where each 10-fold crossvalidation is repeated nrPermute times:
%% SVM-Classification
nrFolds = 10; %number of folds of crossvalidation, 10 is standard
kernel = 'linear'; % 'linear', 'rbf' or 'polynomial'
C = 1;
solver = 'L1QP';
cvFolds = crossvalind('Kfold', labels, nrFolds);
nrPermute = 5;
for k = 1:14
for p = 1:nrPermute
for i = 1:nrFolds % iteratre through each fold
testIdx = (cvFolds == i); % indices of test instances
trainIdx = ~testIdx; % indices training instances
% train the SVM
cl = fitcsvm(features(trainIdx,:),
labels(trainIdx),'KernelFunction',kernel,'Standardize',true,...
'BoxConstraint',C,'ClassNames',[0,1],'Solver',solver);
[label,scores] = predict(cl, features(testIdx,:));
eq = sum(label==labels(testIdx));
accuracy(i) = eq/numel(labels(testIdx));
end
accSubj(p) = mean(accuracy); % accuracy of each permutation
end
crossValAcc(k) = mean(accSubj);
end

In case that would be useful for someone else as well, I figure it out: The loop for permutation should be outside of cvFolds = crossvalind('Kfold', labels, nrFolds); such that the distribution into folds is re-shuffled!

Comparing the function fminunc with the BFGS method for logistic regression

I´m constructing an algorithm that uses the BFGS method to find the parameters in a logistic regression for a binary dataset in Octave.
Now, I´m struggling with something I believe is an overfitting problem. I run the algorithm for several datasets and it actually converges to the same results as the fminunc function of Octave. However for an especific "type of dataset" the algorithm converges to very high values of the parameters, at contrary to the fminunc which gives razonable values of these parameters. I added a regularization term and I actually achieved my algorithm to converge to the same values of fminunc.
This especific type of dataset has data that can be completely separated by a straight line. My question is: why this is a problem for the BFGS method but it´s not a problem for fminunc? How this function avoid this issue without regularization? Could I implement this in my algorithm?
The code of my algorithm is the following:
function [beta] = Log_BFGS(data, L_0)
clc
close
%************************************************************************
%************************************************************************
%Loading the data:
[n, e] = size(data);
d = e - 1;
n; %Number of observations.
d; %Number of features.
Y = data(:, e); %Labels´ values
X_o = data(:, 1:d);
X = [ones(n, 1) X_o]; %Features values
%Initials conditions:
beta_0 = zeros(e, 1);
beta = [];
beta(:, 1) = beta_0;
N = 600; %Max iterations
Tol = 1e-10; %Tolerance
error = .1;
L = L_0; %Regularization parameter
B = eye(e);
options = optimset('GradObj', 'on', 'MaxIter', 600);
[beta_s] = fminunc(#(t)(costFunction(t, X, Y, L)), beta_0, options);
disp('Beta obtained with the fminunc function');
disp("--------------");
disp(beta_s)
k = 1;
a_0 = 1;
% Define the sigmoid function
h = inline('1.0 ./ (1.0 + exp(-z))');
while (error > Tol && k < N)
beta_k = beta(:, k);
x_0 = X*beta_k;
h_0 = h(x_0);
beta_r = [0 ; beta(:, k)(2:e, :)];
g_k = ((X)'*(h_0 - Y) + L*beta_r)/n;
d_k = -pinv(B)*g_k;
a = 0.1; %I´ll implement an Armijo line search here (soon)
beta(:, k+1) = beta(:, k) + a*d_k;
beta_k_1 = beta(:, k+1);
x_1 = X*beta_k_1;
h_1 = h(x_1);
beta_s = [0 ; beta(:, k+1)(2:e, :)];
g_k_1 = (transpose(X)*(h_1 - Y) + L*beta_s)/n;
s_k = beta(:, k+1) - beta(:, k);
y_k = g_k_1 - g_k;
B = B - B*s_k*s_k'*B/(s_k'*B*s_k) + y_k*y_k'/(s_k'*y_k);
k = k + 1;
error = norm(d_k);
endwhile
%Accuracy of the logistic model:
p = zeros(n, 1);
for j = 1:n
if (1./(1. + exp(-1.*(X(j, :)*beta(:, k)))) >= 0.5)
p(j) = 1;
else
p(j) = 0;
endif
endfor
R = mean(double(p == Y));
beta = beta(:, k);
%Showing the results:
disp("Estimation of logistic regression model Y = 1/(1 + e^(beta*X)),")
disp("using the algorithm BFGS =")
disp("--------------")
disp(beta)
disp("--------------")
disp("with a convergence error in the last iteration of:")
disp(error)
disp("--------------")
disp("and a total number of")
disp(k-1)
disp("iterations")
disp("--------------")
if k == N
disp("The maximum number of iterations was reached before obtaining the desired error")
else
disp("The desired error was reached before reaching the maximum of iterations")
endif
disp("--------------")
disp("The precision of the logistic regression model is given by (max 1.0):")
disp("--------------")
disp(R)
disp("--------------")
endfunction
The results I got for the dataset are showed in the following picture. If you need the data used in this situation, please let me know.
Results of the algorithm

Check the objectives!
The values of the solution-vector are nice, but the whole optimization is driven by the objective. You say fminunc which gives reasonable values of these parameters, but reasonable is not defined within this model.
It would not be impossible, that both, your low-value and your high-value solution allows pretty much the same objective. And that's what those solvers are solely caring about (when using no regulization-term).
So the important question is: is there a unique solution (which should disallow these results)? Only when your dataset has full rank! So maybe your data is rank-deficient and you obtain two equally good solutions. Of course there might be slight differences due to numerical-issues, which are always a source of errors, especially in more complex optimization-algorithms.

Feature selection SVM-Recursive Feature elimination (SVM-RFE) with Libsvm, the accuracy result is worse than without feature selection, why?

I'm trying to use SVM-RFE with libsvm library to run on the gene expression dataset. My algorithm is written in Matlab. The particular dataset able to produce 80++% of classification accuracy under the 5-fold CV without applied Feature selection. When I tried to apply svm-rfe on this dataset (same svm parameter setting and use 5-fold CV), the classification result become worse, can only achieve 60++% classification accuracy.
Here is my matlab coding, appreciate if anyone can shed some light on what's wrong with my codes. Thank you in advance.
[label, data] = libsvmread('libsvm_data.scale');
[N D] = size(data);
numfold=5;
indices = crossvalind ('Kfold',label, numfold);
cp = classperf(label);
for i= 1:numfold
disp(strcat('Fold-',int2str(i)));
testix = (indices == i); trainix = ~testix;
test_data = data(testix,:); test_label = label(testix);
train_data = data(trainix,:); train_label = label(trainix);
model = svmtrain(train_label, train_data, sprintf('-s 0 -t 0); %'
s = 1:D;
r = [];
iter = 1;
while ~isempty(s)
X = train_data(:,s);
fs_model = svmtrain(train_label, X, sprintf('-s 0 -t %f -c %f -g %f -b 1', kernel, cost, gamma));
w = fs_model.SVs' * fs_model.sv_coef; %'
c = w.^2;
[c_minvalue, f] = min(c);
r = [s(f),r];
ind = [1:f-1, f+1:length(s)];
s = s(ind);
iter = iter + 1;
end
predefined = 100;
important_feat = r(:,D-predefined+1:end);
for l=1:length(important_feat)
testdata(:,l) = test_data (:,important_feat(l));
end
[predict_label_itest, accuracy_itest, prob_values] = svmpredict(test_label, testdata, model,'-b 1');
acc_itest_fs (:,i) = accuracy_itest(1);
clear testdata;
end
Mean_itest_fs = mean((acc_itest_fs),2);
Mean_bac_fs = mean (bac_fs,2);

After applying RFE to the traindata, you get a subset of the traindata. So when you use the traindata to train a model, I think you should use the subset of the traindata to train this model.

libsvm: Evaluating an SVM using leave-one-out

I am trying to use libsvm with MATLAB to evaluate a one-vs-all SVM, the only issue is that my dataset is not big enough to warrant selecting a specific test set. Thus, I want to evaluate my classifiers using leave-one-out.
I am not particularly experienced in using SVMs, so forgive me if I am a little bit confused as to what to do. I need to generate precision vs recall curves, and confusion matrices for my classifiers but I have no idea where to start.
I've given it a go and came up with the following as a rough start to do the leave on out training but I'm not sure how to do evaluation.
function model = do_leave_one_out(labels, data)
acc = [];
bestC = [];
bestG = [];
for ii = 1:length(data)
% Training data for this iteration
trainData = data;
trainData(ii) = [];
looLabel = labels(ii);
trainLabels = labels;
trainLabels(ii) = [];
% Do grid search to find the best parameters?
acc(ii) = bestReportedAccuracy;
bestC(ii) = bestValueForC;
bestG(ii) = bestValueForG;
end
% After this I am not sure how to train and evaluate the final model
end

I'm trying to provide some modules that you may be interested in, and you can incorporate them into your function. Hope it helps.
Leave-one-out:
scrambledList = randperm(totalNumberOfData);
trainingData = Data(scrambledList(1:end-1),:);
trainingLabel = Label(scrambledList(1:end-1));
testData = Data(scrambledList(end),:);
testLabel = Label(scrambledList(end));
Grid Search (Dual-class case):
acc = 0;
for log2c = -1:3,
for log2g = -4:1,
cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
cv = svmtrain(trainingLabel, trainingData, cmd);
if (cv >= acc),
acc = cv; bestC = 2^log2c; bestG = 2^log2g;
end
end
end
One-vs-all (Used for Multi-class case):
model = cell(NumofClass,1);
for k = 1:NumofClass
model{k} = svmtrain(double(trainingLabel==k), trainingData, '-c 1 -g 0.2 -b 1');
end
%% calculate the probability of different labels
pr = zeros(1,NumofClass);
for k = 1:NumofClass
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
pr(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%% your label prediction will be the one with highest probability:
[~,predctedLabel] = max(pr,[],2);

Principal Component Analysis in MATLAB

I'm implementing PCA using eigenvalue decomposition for sparse data. I know matlab has PCA implemented, but it helps me understand all the technicalities when I write code.
I've been following the guidance from here, but I'm getting different results in comparison to built-in function princomp.
Could anybody look at it and point me in the right direction.
Here's the code:
function [mu, Ev, Val ] = pca(data)
% mu - mean image
% Ev - matrix whose columns are the eigenvectors corresponding to the eigen
% values Val
% Val - eigenvalues
if nargin ~= 1
error ('usage: [mu,E,Values] = pca_q1(data)');
end
mu = mean(data)';
nimages = size(data,2);
for i = 1:nimages
data(:,i) = data(:,i)-mu(i);
end
L = data'*data;
[Ev, Vals] = eig(L);
[Ev,Vals] = sort(Ev,Vals);
% computing eigenvector of the real covariance matrix
Ev = data * Ev;
Val = diag(Vals);
Vals = Vals / (nimages - 1);
% normalize Ev to unit length
proper = 0;
for i = 1:nimages
Ev(:,i) = Ev(:,1)/norm(Ev(:,i));
if Vals(i) < 0.00001
Ev(:,i) = zeros(size(Ev,1),1);
else
proper = proper+1;
end;
end;
Ev = Ev(:,1:nimages);

Here's how I would do it:
function [V newX D] = myPCA(X)
X = bsxfun(#minus, X, mean(X,1)); %# zero-center
C = (X'*X)./(size(X,1)-1); %'# cov(X)
[V D] = eig(C);
[D order] = sort(diag(D), 'descend'); %# sort cols high to low
V = V(:,order);
newX = X*V(:,1:end);
end
and an example to compare against the PRINCOMP function from the Statistics Toolbox:
load fisheriris
[V newX D] = myPCA(meas);
[PC newData Var] = princomp(meas);
You might also be interested in this related post about performing PCA by SVD.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Selecting SVM parameters using cross validation and F1-scores - matlab

I found the only problem with target(trainIdx). It's a row vector so I just replaced target(trainIdx) with target(trainIdx) which is a column vector.

Related

Repeated classification accuracies in a loop always the same

Comparing the function fminunc with the BFGS method for logistic regression

Feature selection SVM-Recursive Feature elimination (SVM-RFE) with Libsvm, the accuracy result is worse than without feature selection, why?

libsvm: Evaluating an SVM using leave-one-out

Principal Component Analysis in MATLAB

Categories

Resources