I know that Cross validation is used for selecting good parameters. After finding them, i need to re-train the whole data without the -v option.
But the problem i face is that after i train with -v option, i get the cross-validation accuracy( e.g 85%). There is no model and i can't see the values of C and gamma. In that case how do i retrain?
Btw i applying 10 fold cross validation.
e.g
optimization finished, #iter = 138
nu = 0.612233
obj = -90.291046, rho = -0.367013
nSV = 165, nBSV = 128
Total nSV = 165
Cross Validation Accuracy = 98.1273%
Need some help on it..
To get the best C and gamma, i use this code that is available in the LIBSVM FAQ
bestcv = 0;
for log2c = -6:10,
for log2g = -6:3,
cmd = ['-v 5 -c ', num2str(2^log2c), ' -g ', num2str(2^log2g)];
cv = svmtrain(TrainLabel,TrainVec, cmd);
if (cv >= bestcv),
bestcv = cv; bestc = 2^log2c; bestg = 2^log2g;
end
fprintf('(best c=%g, g=%g, rate=%g)\n',bestc, bestg, bestcv);
end
end
Another question : Is that cross-validation accuracy after using -v option similar to that we get when we train without -v option and use that model to predict? are the two accuracy similar?
Another question : Cross-validation basically improves the accuracy of the model by avoiding the overfitting. So, it needs to have a model in place before it can improve. Am i right? Besides that, if i have a different model, then the cross-validation accuracy will be different? Am i right?
One more question: In the cross-validation accuracy, what is the value of C and gamma then?
The graph is something like this
Then the values of C are 2 and gamma = 0.0078125. But when i retrain the model with the new parameters. The value is not the same as 99.63%. Could there be any reason?
Thanks in advance...
The -v option here is really meant to be used as a way to avoid the overfitting problem (instead of using the whole data for training, perform an N-fold cross-validation training on N-1 folds and testing on the remaining fold, one at-a-time, then report the average accuracy). Thus it only returns the cross-validation accuracy (assuming you have a classification problem, otherwise mean-squared error for regression) as a scalar number instead of an actual SVM model.
If you want to perform model selection, you have to implement a grid search using cross-validation (similar to the grid.py helper python script), to find the best values of C and gamma.
This shouldn't be hard to implement: create a grid of values using MESHGRID, iterate overall all pairs (C,gamma) training an SVM model with say 5-fold cross-validation, and choosing the values with the best CV-accuracy...
Example:
%# read some training data
[labels,data] = libsvmread('./heart_scale');
%# grid of parameters
folds = 5;
[C,gamma] = meshgrid(-5:2:15, -15:2:3);
%# grid search, and cross-validation
cv_acc = zeros(numel(C),1);
for i=1:numel(C)
cv_acc(i) = svmtrain(labels, data, ...
sprintf('-c %f -g %f -v %d', 2^C(i), 2^gamma(i), folds));
end
%# pair (C,gamma) with best accuracy
[~,idx] = max(cv_acc);
%# contour plot of paramter selection
contour(C, gamma, reshape(cv_acc,size(C))), colorbar
hold on
plot(C(idx), gamma(idx), 'rx')
text(C(idx), gamma(idx), sprintf('Acc = %.2f %%',cv_acc(idx)), ...
'HorizontalAlign','left', 'VerticalAlign','top')
hold off
xlabel('log_2(C)'), ylabel('log_2(\gamma)'), title('Cross-Validation Accuracy')
%# now you can train you model using best_C and best_gamma
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%# ...
If you use your entire dataset to determine your parameters, then train on that dataset, you are going to overfit your data. Ideally, you would divide the dataset, do the parameter search on a portion (with CV), then use the other portion to train and test with CV. Will you get better results if you use the whole dataset for both? Of course, but your model is likely to not generalize well. If you want determine true performance of your model, you need to do parameter selection separately.
Related
I know that LIBSVM only allows one-vs-one classification when it comes to multi-class SVM. However, I would like to tweak it a bit to perform one-against-all classification. I have tried to perform one-against-all below. Is this the correct approach?
The code:
TrainLabel;TrainVec;TestVec;TestLaBel;
u=unique(TrainLabel);
N=length(u);
if(N>2)
itr=1;
classes=0;
while((classes~=1)&&(itr<=length(u)))
c1=(TrainLabel==u(itr));
newClass=c1;
model = svmtrain(TrainLabel, TrainVec, '-c 1 -g 0.00154');
[predict_label, accuracy, dec_values] = svmpredict(TestLabel, TestVec, model);
itr=itr+1;
end
itr=itr-1;
end
I might have done some mistakes. I would like to hear some feedback. Thanks.
Second Part:
As grapeot said :
I need to do Sum-pooling (or voting as a simplified solution) to come up with the final answer. I am not sure how to do it. I need some help on it; I saw the python file but still not very sure. I need some help.
%# Fisher Iris dataset
load fisheriris
[~,~,labels] = unique(species); %# labels: 1/2/3
data = zscore(meas); %# scale features
numInst = size(data,1);
numLabels = max(labels);
%# split training/testing
idx = randperm(numInst);
numTrain = 100; numTest = numInst - numTrain;
trainData = data(idx(1:numTrain),:); testData = data(idx(numTrain+1:end),:);
trainLabel = labels(idx(1:numTrain)); testLabel = labels(idx(numTrain+1:end));
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(trainLabel==k), trainData, '-c 1 -g 0.2 -b 1');
end
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(testLabel==k), testData, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == testLabel) ./ numel(testLabel) %# accuracy
C = confusionmat(testLabel, pred) %# confusion matrix
From the code I can see you are trying to first turn the labels into "some class" vs "not this class", and then invoke LibSVM to do training and testing. Some questions and suggestions:
Why are you using the original TrainingLabel for training? In my opinion, should it be model = svmtrain(newClass, TrainVec, '-c 1 -g 0.00154');?
With modified training mechanism, you also need to tweak the prediction part, such as using sum-pooling to determine the final label. Using -b switch in LibSVM to enable probability output will also improve the accuracy.
Instead of probability estimates, you can also use the decision values as follows
[~,~,d] = svmpredict(double(testLabel==k), testData, model{k});
prob(:,k) = d * (2 * model{i}.Label(1) - 1);
to achieve the same purpose.
I want to perform a Cross Validation to select the best parameters Gamma and C for the RBF Kernel of the SVR (Support Vector Regression). I'm using LIBSVM. I have a database that contains 4 groups of 3D meshes.
My question is:
is this approach I am using is ok for 4-fold Cross Validation? I think, for selecting the parameters C and Gamma of the RBF Kernal, I must minimize the error between the predicted values and the groud_truth_values.
I have also another problem, I get this a NAN value while the Cross-Validation (Squared correlation coefficient = nan (regression))
Here is the code i wrote:
[C,gamma] = meshgrid(-5:2:15, -15:2:3); %range of values for C and
%gamma
%# grid search, and cross-validation
for m=1:numel(C)
for k=1:4
fid1 = fopen(sprintf('list_learning_%d.txt',k), 'rt');
i=1;
while feof(fid1) == 0
tline = fgetl(fid1);
v= load(tline);
v=normalize(v);
matrix_feature_tmp(i,:)=v;
i=i+1;
end
fclose(fid1);
% I fill matrix_feature_train of size m by n via matrix_feature_tmp
%%construction of the test matrix
fid2 = fopen(sprintf('liste_features_test%d.txt',k), 'rt');
i=1;
while feof(fid2) == 0
tline = fgetl(fid2);
v= load(tline);
v=normalize(v);
matrice_feature_test_tmp(i,:)=v;
i=i+1;
end
fclose(fid2);
%I fill matrix_feature_test of size m by k via matrix_feature_test_tmp
mos_learning=load(sprintf('mos_learning_%d.txt',k));
mos_wanted=load(sprintf('mos_test%d.txt',k));
model = svmtrain(mos_learning, matrix_feature_train',sprintf('-
s %f -t %f -c %f -g %f -p %f ',3,2 ,2^C(m),2^gamma(m),1 ));
[y_hat, Acc, projection] = svmpredict(mos_wanted,
matrix_feature_test', model);
MSE_Test = mean((y_hat-mos_wanted).^2);
vecc_error(k)=MSE_Test;
end
mean_vec_error_fold(m)=mean(vecc_error);
end
%select the best gamma and C
[~,idx]=min(mean_vec_error_fold);
best_C = 2^C(idx);
best_gamma = 2^gamma(idx);
%training with best parameters
%for example
model = svmtrain(mos_learning1, matrice_feature_train1',sprintf('-s
%f -t %f -c %f -g %f -p %f ',3,2 ,best_C, best_gamma,1 ));
[y_hat_final, Acc, projection] = svmpredict(mos_test1,matrice_feature_test1',
model);
Based on your description, without reading your code, it sounds like you are NOT doing cross-validation. Cross-validation requires you to pick a parameter set (i.e. a value for C and gamma) and holding those parameters constant use k-1 folds to train, 1 fold to test and to do this k times such that you use each fold as the test set once. Then aggregate the error / accuracy measure for these k tests and that is the measure you use to rank those parameters for a model trained on ALL the data. Call this your cross-validation error for the parameter set you used. You then repeat this process for a range of different parameters and choose the parameter set with the best accuracy / lowest CV error. Your final model is trained on all your data.
Your code doesn't really make sense to me. Looking at this snippet
folds = 4;
for i=1:numel(C)
cv_acc(i) = svmtrain(ground_truth, matrice_feature_train', ...
sprintf(' -s %d -t %d -c %f -g %f -p %d -v %d',3,2,
2^C(i), 2^gamma(i), 1, 4)); %Kernel RBF
end
What is it that cv_acc contains? To me it contains the actual SVM model (an SVMStruct if you use the MATLAB toolbox, something else if you used LIBSVM). This would be OK IF you were using your loop to change which folds are used as the training set. However you have used them to change the value of your gamma and C parameters, which is incorrect. However you later call min(cv_acc); so I'm now guessing that you think the call to smvtrain actually returned the training error? I don't see how you can meaningfully call min on an array of structures like that, but I could be wrong. But even so, you aren't actually interested in minimising your training error, you want to minimise your cross-validation error which is the aggregate of the test error from your k runs and has nothing to do with your training error.
Now it's impossible to actually know if you've done this bt wrong since you don't show us the vectors of gamma and C but it's strange to only have 1 loop rather than a nested loop to iterate through these (unless you have arranged them like a truth-table but I doubt that). You need to test each potential value of C paired with each value of gamma. Currently it looks like you're only trying 1 different value of gamma for each value in C.
Have a look at this answer to see an example of cross-validation used with SVM.
I'm trying to implement a simple SVM linear binary classification in Matlab but I got strange results.
I have two classes g={-1;1} defined by two predictors varX and varY. In fact, varY is enough to classify the dataset in two distinct classes (about varY=0.38) but I will keep varX as random variable since I will need it to other works.
Using the code bellow (adapted from MAtlab examples) I got a wrong classifier. Linear classifier should be closer to an horizontal line about varY=0.38, as we can perceive by ploting 2D points.
It is not displayed the line that should separate two classes
What am I doing wrong?
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
d = 0.005; % Step size of the grid
[x1Grid,x2Grid] = meshgrid(min(m3(:,1)):d:max(m3(:,1)),...
min(m3(:,2)):d:max(m3(:,2)));
xGrid = [x1Grid(:),x2Grid(:)]; % The grid
[~,scores2] = predict(SVMmodel_testm,xGrid); % The scores
figure();
h(1:2)=gscatter(m3(:,1), m3(:,2), g,'br','ox');
hold on
% Support vectors
h(3) = plot(m3(SVMmodel_testm.IsSupportVector,1),m3(SVMmodel_testm.IsSupportVector,2),'ko','MarkerSize',10);
% Decision boundary
contour(x1Grid,x2Grid,reshape(scores2(:,1),size(x1Grid)),[0 0],'k');
xlabel('varX'); ylabel('varY');
set(gca,'Color',[0.5 0.5 0.5]);
hold off
A common problem with SVM or any classification method for that matter is unnormalized data. You have one dimension that spans for 0 to 1 and the other from about 0.3 to 0.4. This causes inbalance between the features. Common practice is to somehow normalize the features, for examply by std. try this code:
g(1:14,1)=1;
g(15:26,1)=-1;
m3(:,1)=rand(26,1); %varX
m3(:,2)=[0.4008; 0.3984; 0.4054; 0.4048; 0.4052; 0.4071; 0.4088; 0.4113; 0.4189;
0.4220; 0.4265; 0.4353; 0.4361; 0.4288; 0.3458; 0.3415; 0.3528;
0.3481; 0.3564; 0.3374; 0.3610; 0.3241; 0.3593; 0.3434; 0.3361; 0.3201]; %varY
m3(:,2) = m3(:,2)./std(m3(:,2));
SVMmodel_testm = fitcsvm(m3,g,'KernelFunction','Linear');
Notice the line before the last.
I am trying to do a 5 fold cross validation with libsvm (matlab) using a precomputed kernel, but, I get the following error message :
Undefined function 'ge' for input arguments of type 'struct'.
this is because the Libsvm return a structure instead of a value in cross validation, How can I solve this problem, this is my code:
load('iris.dat')
data=iris(:,1:4);
class=iris(:,5);
% normalize the data
range=repmat((max(data)-min(data)),size(data,1),1);
data=(data-repmat(min(data),size(data,1),1))./range;
% train
tr_data=[data(1:5,:);data(52:56,:);data(101:105,:)];
tr_lbl=[ones(5,1);2*ones(5,1);3*ones(5,1)];
% kernel computation
sigma=.8
rbfKernel = #(X,Y,sigma) exp((-pdist2(X,Y,'euclidean').^2)./(2*sigma^2));
Ktr=[(1:15)',rbfKernel(tr_data,tr_data,sigma)];
kts=[ (1:150)',rbfKernel(data,tr_data,sigma)];
% svmptrain
bestcv = 0;
for log2c = -1:3
cmd = ['Ktr -t 4 -v 5 -c ', num2str(2^log2c)];
cv = svmtrain2(tr_lbl,tr_data, cmd);
if (cv >= bestcv)
bestcv = cv;
bestc = 2^log2c;
end
end
cmd=['-s 0 -c ', num2str(bestc), 'Ktr -t 4']
model=svmtrain2(tr_lbl,tr_data,cmd)
% svm predict
labels=svmpredict(class,data,model,kts)
The function svmtrain2 you are using is not part of standard MATLAB and also the output of the function is not a structure. But if you insist to use that, you can calculate an score for data using the other existing function:
[f,K] = svmeval(X_eval,varargin)
that evaluates the trained svm using the outputs from svmtrain2. But I prefer to use first the standard functions embedded in MATLAB. In standard MATLAB library there is:
SVMStruct = svmtrain(Training,Group)
that returns a structure, SVMStruct, containing information about the trained support vector machine (SVM) classifier. or
SVMModel = fitcsvm(X,Y)
that returns a support vector machine classifier SVMModel, trained by predictors X and class labels Y for one- or two-class classification. and then you can get some score for each prediction using:
[label,Score] = predict(SVMModel,X)
that returns class likelihood measures, i.e., either scores or posterior probabilities.
You get that error because you are trying to compare a struct and a number.
If what you want is to find the best performance in the training set (as it seems from you comparison), I don't think you can get it directly from the structure returned from svmtrain. You should first use svmpredict with the training set and the trained model, and you can get the accuracy from the resulting structure.
I am trying to implement Naive Bayes Classifier using a dataset published by UCI machine learning team. I am new to machine learning and trying to understand techniques to use for my work related problems, so I thought it's better to get the theory understood first.
I am using pima dataset (Link to Data - UCI-ML), and my goal is to build Naive Bayes Univariate Gaussian Classifier for K class problem (Data is only there for K=2). I have done splitting data, and calculate the mean for each class, standard deviation, priors for each class, but after this I am kind of stuck because I am not sure what and how I should be doing after this. I have a feeling that I should be calculating posterior probability,
Here is my code, I am using percent as a vector, because I want to see the behavior as I increase the training data size from 80:20 split. Basically if you pass [10 20 30 40] it will take that percentage from 80:20 split, and use 10% of 80% as training.
function[classMean] = naivebayes(file, iter, percent)
dm = load(file);
for i=1:iter
idx = randperm(size(dm.data,1))
%Using same idx for data and labels
shuffledMatrix_data = dm.data(idx,:);
shuffledMatrix_label = dm.labels(idx,:);
percent_data_80 = round((0.8) * length(shuffledMatrix_data));
%Doing 80-20 split
train = shuffledMatrix_data(1:percent_data_80,:);
test = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
train_labels = shuffledMatrix_label(1:percent_data_80,:)
test_labels = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the array of percents
for pRows = 1:length(percent)
percentOfRows = round((percent(pRows)/100) * length(train));
new_train = train(1:percentOfRows,:)
new_trin_label = shuffledMatrix_label(1:percentOfRows)
%get unique labels in training
numClasses = size(unique(new_trin_label),1)
classMean = zeros(numClasses,size(new_train,2));
for kclass=1:numClasses
classMean(kclass,:) = mean(new_train(new_trin_label == kclass,:))
std(new_train(new_trin_label == kclass,:))
priorClassforK = length(new_train(new_trin_label == kclass))/length(new_train)
priorClassforK_1 = 1 - priorClassforK
end
end
end
end
First, compute the probability of evey class label based on frequency counts. For a given sample of data and a given class in your data set, you compute the probability of evey feature. After that, multiply the conditional probability for all features in the sample by each other and by the probability of the considered class label. Finally, compare values of all class labels and you choose the label of the class with the maximum probability (Bayes classification rule).
For computing conditonal probability, you can simply use the Normal distribution function.