How to use dataset of UCI for classification of RVM? - matlab

I have downloaded some datasets from UCI for classification of RVM task.However,I am not sure about how to use it.I guess that these datasets must be normalized or do some other job before using it for training and testing.
For example,I have downloaded 'banknote authentication Data Set' on UCI.And use svmtrain in matlab to obtain a svm model(use svm model for testing data and then use rvm codes if result of svm classification is ok).
>> load banknote
>> meas = banknote(:,1:4);
>> species = banknote(:,5);
>> data = [meas(:,1), meas(:,2), meas(:,3), meas(:,4)];
>> groups = ismember(species,1);
>> [train, test] = crossvalind('holdOut',groups);
>> cp = classperf(groups);
>> svmStruct = svmtrain(data(train,:),groups(train),'showplot',true);
These is what I do in matlab,and get the following message:
??? Error using ==> svmtrain at 470
Unable to solve the optimization problem:
Maximum number of iterations exceeded; increase options.MaxIter.
To continue solving the problem with the current solution as the
starting point, set x0 = x before calling quadprog.
And here are a part of the dataset(total lines 1372 and use some for training and the rest for testing):
3.6216,8.6661,-2.8073,-0.44699,0
4.5459,8.1674,-2.4586,-1.4621,0
3.866,-2.6383,1.9242,0.10645,0
3.4566,9.5228,-4.0112,-3.5944,0
0.32924,-4.4552,4.5718,-0.9888,0
4.3684,9.6718,-3.9606,-3.1625,0
3.5912,3.0129,0.72888,0.56421,0
2.0922,-6.81,8.4636,-0.60216,0
3.2032,5.7588,-0.75345,-0.61251,0
1.5356,9.1772,-2.2718,-0.73535,0
1.2247,8.7779,-2.2135,-0.80647,0
3.9899,-2.7066,2.3946,0.86291,0
1.8993,7.6625,0.15394,-3.1108,0
-1.5768,10.843,2.5462,-2.9362,0
3.404,8.7261,-2.9915,-0.57242,0
So, any good advice about this problem?Thank you all for helping.

Later to commit.Use scale function to normalization the feature.And if the datasets have too many features,we can use PCA to reduce dimension.

Related

In MATLAB, how do we classify or regress using a partitioned model?

It seems that cross validated models cannot be used with the predict function. How would one go about using the model with a test set? For example:
ens = fitcecoc(X, T, 'KFold', 10)
Directly using the predict function throws an error and MATLAB documentation explains why it does so very well. ens is a partitioned model with 10 different classifiers. Should we run predict using each classifier and then use the class with the maximum agreement?
Couple of other similar question haven't received answers so I figured I'll answer my own question with the solution I found. MATLAB K-Fold cross validation produces K different classifiers or regressors. They have been generated by holding out portions of data (holdout operation is random, hence if you have an unbalanced dataset - be careful). In order to predict the output class, you could iterate over all the trained K models and use mode to get the accurate class.
cv_Ensemble = crossval(Ensemble_Model, 'KFold', 10);
classIdx = zeros(N, length(cv_Ensemble .Trained));
for p = 1:length(cv_Ensemble .Trained)
[temp, ~] = predict(cv_Ensemble.Trained{p}, Data_f);
classIdx(:, p) = temp;
end
classIdx = mode(classIdx, 2)

How to fix the test and train set in svm using matlab?

I am new to matlab. I want to know how to fixed the train and test set in svm code because I had find a code, the code randomly selects the test and train set. my database is YMU database, how should I fix the train and test set using svm code. because I use the crossvalind to randomly select the train and test set. which variable should I change with the crossvalind?
%load YMU database
%NMC is non-makeup , MC is makeup
%testingset = non-makeup, trainingset is makeup
load TestingSetNMC.mat
load TrainingSetMC.mat
load gnd_Test.mat
load gnd_Train.mat
data1 = TrainingSet;
data2 = TestingSet;
groups1 = ismember(gnd_Train,'data1');
groups2 = ismember(gnd_Test,'data2');
%crossvalind is random choose
[train] = crossvalind('holdOut',groups1);
[test] = crossvalind('holdOut',groups2);
cp = classperf(groups1);
svmStruct = svmtrain(data1(train,:),groups1(train),'showplot',true);
classes = svmclassify(svmStruct,data2(test,:),'showplot',true);
classperf(cp,classes,test);
cp.CorrectRate
With (most) matlab functions that generate pseudo-random output you can control that output by explicitly specifying a random number generator's seed and method.
In your case, place the following line anywhere before you call crossvalind:
rng(1, 'twister');
This sets the seed to 1 and the method to Mersenne Twister. In the documentation for rng you will find a more detailed explanation about controlling pseudo-random output.

How to improve the perfomance of SVM?

Im using LIBSVM and MatLab to classify 34x5 data in 3 classes. I applied 10 fold Kfold cross validation method and RBF kernel. The output is this confusion matrix with 0.88 Correct rate (88 % accuracy). This is my confusion matrix
9 0 0
0 3 0
0 4 18
I would like to know what methods inside SVM to consider to improve the accuracy or other classifications method in Machine learning techniques. Any help?
Here is my SVM classification code
load Turn180SVM1; //load data file
libsvm_options = '-s 1 -t 2 -d 3 -r 0 -c 1 -n 0.1 -p 0.1 -m 100 -e 0.000001 -h 1 -b 0 -wi 1 -q';//svm options
C=size(Turn180SVM1,2);
% cross validation
for i = 1:10
indices = crossvalind('Kfold',Turn180SVM1(:,C),10);
cp = classperf(Turn180SVM1(:,C));
for j = 1:10
[X, Z] = find(indices(:,end)==j);%testing
[Y, Z] = find(indices(:,end)~=j);%training
feature_training = Turn180SVM1([Y'],[1:C-1]); feature_testing = Turn180SVM1([X'],[1:C-1]);
class_training = Turn180SVM1([Y'],end); class_testing = Turn180SVM1([X'], end);
% SVM Training
disp('training');
[feature_training,ps] = mapminmax(feature_training',0,1);
feature_training = feature_training';
feature_testing = mapminmax('apply',feature_testing',ps)';
model = svmtrain(class_training,feature_training,libsvm_options);
%
% SVM Prediction
disp('testing');
TestPredict = svmpredict(class_testing,sparse(feature_testing),model);
TestErrap = sum(TestPredict~=class_testing)./length(class_testing)*100;
cp = classperf(cp, TestPredict, X);
disp(((i-1)*10 )+j);
end;
end;
[ConMat,order] = confusionmat(TestPredict,class_testing);
cp.CorrectRate;
cp.CountingMatrix;
Many methods exist. If your tuning procedure is optimal (e.g. well executed cross-validation) your choices include:
Improve preprocessing, perhaps tailor new aggregated features based on domain knowledge. Most importantly (and most effectively): make sure your inputs are standardized properly, for example by scaling every dimension onto [-1,1].
Use another kernel: RBF kernels are known to perform very well in a wide variety of settings, but specialised kernels exist for many tasks. Don't consider this unless you know what you are doing. Since you are dealing with a low-dimensional problem, RBF is probably a good choice if your data is not structured.
Reweigh training instances: particularly important when your data set is unbalanced (e.g. some classes have a lot less instances than others). You can do this with the -wX options in libsvm. All sorts of reweighting schemes exist, including variants of boosting. I'm not a major fan of this, since such approaches are prone to overfitting.
Change the cross-validation cost function to suit your exact needs. Is accuracy really what you are looking for or do you want, say, high F1 or high ROC-AUC? It is surprising how many people optimize a performance measure they are not really interested in.

Thumb Recognition in matlab using SVM algo

I am working on a project thumb recognition. following is code I am reading the 118 images of order 42 X 25 and storing them in training matrix.
training=zeros(118, 1050);
imagefiles = dir('*.png');
nfiles = length(imagefiles);
for ii=1:nfiles
currentfilename = imagefiles(ii).name;
I = imread(currentfilename);
BW=im2bw(I,graythresh(I));
temp = reshape(BW,1,1050);
training(ii,:)=temp;
end
Now I am creating a matrix of labelData to assign labels to images.
labelData = zeros(118,1);
labelData(1:50,:) = 0;
labelData(51:83,:) = 1;
labelData(84:118,:) = 2;
Here i am training my system by giving training data and label data.
options=optimset('MaxIter',5000);
SVMStruct = svmtrain(training,labelData,'Kernel_Function','linear','QuadProg_Opts',options);
BUT when I run this code it is giving me an error like
Error 1 : SVMTRAIN only supports classification into two groups. GROUP contains 3 groups.
Error 2 : SVMStruct = svmtrain(training,labelData,'Kernel_Function','linear','QuadProg_Opts',options);
Kindly help me what is the problem I used it before it was working fine but now I dont know what is going on. Thanks in advance.
Error 1 tells you what the problem is - the MATLAB built-in SVM only supports binary classification. You are assigning 3 classes.
Your options are:
Construct three classifiers: 0 vs. 1,2 then 1 vs. 0,2 then 2 vs. 0,1 and look at the output of each.
Construct 0 vs. not 0 and then 1 vs. 2
Use a multi-class SVM trainer from LIBSVM or svmlight or other such packages.
The error message is pretty clear. MATLAB's svmtrain does not support multiclass classification, that is only two classes are allowed.
So, you have two options: 1) write your own multiclass classifier as a wrapper around svmtrain. You can implement one-vs-all or one-vs-one strategies. 2) use a svm implementation that already supports multiclass classification such as libsvm.
Your problem is in the labelData vector ceck it and find the eror, yoy shoild OAA architector if hthe number of classes is more then .

Matlab Weka Interface AdaBoost Issues: Out of Bounds Exception

I'm doing some cross-validation using a Matlab Weka Interface that I got from file exchange. My loop structure seems to work fine for Weka's Logistic classifier. However, when I try to do the exact same thing for AdaBoostM1, it throws the following error:
??? Java exception occurred: java.lang.ArrayIndexOutOfBoundsException
Error in ==> wekaClassify at 24 classProbs(t+1,:) = (classifier.distributionForInstance(testData.instance(t)))';
Error in ==> classifier_search at 225 [pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I have determined through some testing that this only occurs when the number of instances in the training set is greater than the number of instances in the test set. I am sure you can see why that is a problem for me, since in most situations the training set is greater than the test set in size.
Is there something different about how I should format my inputs when using Adaboost rather than Logistic? Any information you can give regarding this problem would be so helpful.
I downloaded this code from this page: http://www.mathworks.com/matlabcentral/fileexchange/21204-matlab-weka-interface
Emails bounce from the account of the guy who made it, and he doesn't seem to respond to comments on the page - I'm hoping that maybe someone here has used this.
EDIT: Here is the code that I use to train and test the classifier:
classifier = trainWekaClassifier(matlab2weka('training', featurelabels, train), 'meta.AdaBoostM1', { strcat('-P 100 -S 1 -I ', num2str(r), '-W weka.classifiers.trees.DecisionStump')});
[pred ~] = wekaClassify(matlab2weka('instance', featurelabels, tester), classifier);
I haven't used this combination of software, so I can only take a guess at what could cause this.
Are your training/testing data matrices the right way round? They should be N-by-D (N instances, D features).
If you were passing in a D-by-N training matrix and a D-by-M testing matrix, then I would expect it to work only when M < N - which is what you describe - and even then, it wouldn't give a meaningful result.