How can I get a SVM that has been trained on a bigger matrix to classify a different size matrix - matlab

I am training a one vs all svm classifier. I used a 200 by 459 matrix to train the classifier using VLFeat svm classifier. (http://www.vlfeat.org/matlab/vl_svmtrain.html)
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
where train_image_feats' is a 200 by 459 matrix, and tmp' is the label matrix which is 1 by 459 vector.
The above command trains the svm with no problem, but then to classify the scores obtained on the test matrix I get an error. The test matrix is obviously not of the same size as that of the training matrix.
scores(i, :) = W'*test_image_feats' + B;
Where test_image_feats' is a 200 by 90 matrix. scores is a 9 by 459 matrix. 9 Because there are 9 categories(labels) to classify and 459 are the number of training images.
The above command gives the error:
Subscripted assignment dimension mismatch.
Error in svm_classify (line 56) scores(i, :) = W'*test_image_feats'
+ B;
Edit: Full code added..
categories = unique(train_labels);
num_categories = length(categories);
scores = zeros([num_categories size(train_labels, 1)]); %train_labels is 459 by 1 size
for i=1:num_categories %there are 9 categories
tmp = strcmp(train_labels, categories{i});
tmp = tmp - (1-tmp);
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
scores(i, :) = W'*test_image_feats' + B;
end
predicted_categories = cell(size(train_labels));
parfor i=1:size(test_image_feats,1)
image_scores = scores(:, i);
label_index = find(image_scores==max(image_scores));
predicted_categories{i}=categories(label_index);
end

Conceptually you are training a model with 459 training samples to predict the scores of 90 test samples.
scores = zeros([num_categories size(train_labels, 1)]);
isn't right as it will be the size of the training set. In fact you don't have to care at all about the size of the training set, you could train the model with 20 or 20000 images the prediction step shouldn't be any different.
scores have to be defined with the test case in mind
scores = zeros([num_categories size(test_labels, 1)]);
When you used 459 for both it only worked because size(test_labels, 1) was equal to size(train_labels, 1)

The problem is not with your right hand side of the assignment, but with score(i,:): you are trying to assign a 9-by-90 size matrix into a single row of score - this simply won't fit.

Related

Confusion matrix error with less test data than training data

I have an issue with my model accuracy calculation. I used the code below:
y_train = [ 1 1 1 4 4 3 3 5 5 5 ]; % true labels for x_train
%x_test : has no true labels.
predictedLabel=[ 1 2 3 4 5 ]; % predicted labels for x_test
group=y_train ; % 10
grouphat=predictedLabel; % for test 5 test data
C=confusionmat(group,grouphat);
Accuracy = sum ( diag (C)) / sum (C (:)) ×100;
but I get the error:
Error using confusionmat (line 75)
G and GHAT need to have same number of rows
Do I get this error since the test data is more or less than the train? There is no true label for test data (semi supervised learning).
Your training labels and predicted labels are based on different inputs, so it doesn't make sense to compare them in a confusion matrix. From the confusionmat docs:
returns the confusion matrix C determined by the known and predicted groups
i.e. the known and predicted results for the same data.
Take this partly pseudo-code example, see the comments for details
% split your input data
trainData = data(1:100, :); % Training data
testData = data(101:120, :); % Testing data (mutually exclusive from training)
% Do some training (pseudo-code, not valid MATLAB)
% ** Let's assume that the labels are in column 1 **
model = train( trainData(:,1), trainData(:,2:end) );
% Test your model on the input data, excluding the actual labels in column 1
predictedLabels = model( testData(:,2:end) );
% Get the actual labels from column 1
actualLabels = testData(:,1);
% Note that size(predictedLabels) == size(actualLabels)
% Now we can do a confusion matrix
C = confusionmat( actualLabels, predictedLabels )

2 errors in libsvm matlab "Model does not support probabiliy estimates and Subscripted assignment dimension mismatch"

I want to classify a list of 5 test images using the library LIBSVM with a strategy 'one against all' in order to obtain probabilities for each class. the used code is bellow :
load('D:\xapp.mat');
load('D:\xtest.mat');
load('D:\yapp.mat');%% matrix contains true class of images yapp=[641;645;1001;1010;1100]
load('D:\ytest.mat');%% matrix contains unlabeled class of test set ytest=[1;2;3;4;5]
numLabels=max(yapp);
numTest=size(ytest,1);
%# train one-against-all models
model = cell(numLabels,1);
for k=1:numLabels
model{k} = svmtrain(double(yapp==k),xapp, ['-c 1000 -g 10 -b 1 ']);
end
%# get probability estimates of test instances using each model
prob = zeros(numTest,numLabels);
for k=1:numLabels
[~,~,p] = svmpredict(double(ytest==k), xtest, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
%# predict the class with the highest probability
[~,pred] = max(prob,[],2);
acc = sum(pred == ytest) ./ numel(ytest) %# accuracy
I obtain this error :
Model does not support probabiliy estimates
Subscripted assignment dimension mismatch.
Error in comp (line 98)
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
please, help me to solve this error and thanks in advance
What you're trying to do is to use a code snippet that evaluates a SVM classifier performances whereas your goal is to properly estimate the labels for your test set.
I assume your five labels are [641;645;1001;1010;1100] (as in yapp). First thing you have to do is delete ytest, because you don't know any labels for the test set. It is pointless to fill ytest with some dummy values: the SVMs will return our predicted labels.
The first error, as already pointed out in the comments is in
numLabels=max(yapp);
you must change max() with length() in order to gather the number of classes.
The training stage is almost correct.
Given the fact that k goes from 1 to 5 whereas yapp has the range above, you should consider changing double(yapp==k) into double(yapp==yapp(k)): in this manner we mark as positive the k-th value in yapp. Given the fact that k goes from 1 to 5, then yapp(k) will go from 641 to 1100.
And now the prediction stage.
The first input for svmpredict() should be the test labels but now we don't know them so we can fill it with a vector of zeros (there will be as many zeros as there are patterns in the test set). That is because svmpredict() automatically returns the accuracy as well if the test labels are known, but that's not the case. So you must change the second for-loop to
for k=1:numLabels
[~,~,p] = svmpredict(zeros(size(xtest,1),1), xtest, model{k}, '-b 1');
prob(:,k) = p(:,model{k}.Label==1); %# probability of class==k
end
and finally predict the labels with
[~,pred] = max(prob,[],2);
and pred contains the predicted labels.
Note 1: in this method, however, you cannot measure accuracy and/or other parameters because what we called the test set actually is not a test set. A test set is a labelled set and we pretend we don't know its labels in order to let the SVM predict them and then match the predicted labels with the actual labels in order to measure its accuracy.
Note 2: predicted labels in pred will most likely have values in range 1 to 5 due to the second for-loop. However, since your labels have different values, you can map back taking into account that 1 is 641, 2 is 645, 3 is 1001, 4 is 1010, 5 is 1100.

Find the classification rate of testing data

I need to use KNN search to classify the testing data and find the classification rate.
Below is the matlab code:
for example:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load fisheriris
x = meas(:,3:4); % x =all training data
y = [5 1.45;6 2;2.75 .75]; % y =3 testing data
[n,d] = knnsearch(x,y,'k',10); % find the 10 nearest neighbors to three testing data
for b=1:3
tabulate(species(n(b,:)))
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The result was display in Command window:
tabulate(species(n(1,:)))
Value Count Percent
virginica 2 20.00%
versicolor 8 80.00%
tabulate(species(n(2,:)))
Value Count Percent
virginica 10 100.00%
tabulate(species(n(3,:)))
Value Count Percent
versicolor 7 70.00%
setosa 3 30.00%
If the testing points are 'Versicolor',the result of first and third testing point are classify correctly and second testing point is wrong one.So the classification rate is 2/3 x100%=66.7%.
Is there any idea to modify the matlab code to find the classification rate automatically and save the result into the Workspace?
In general you can find the number of correct predictions by using
sum(predicted_class == true_class) % For numerical data
sum(strcmp(predicted_class, true_class)) % For cellstrings
Or as a percentage
100 * sum(predicted_class == true_class) / length(predicted_class)
In the case of fisheriris the true class would be species. For your constructed data it would be
true_classes = [cellstr('versicolor'); cellstr('versicolor'); cellstr('versicolor')]
In the case of nearest neighbours, the true classes would be the class of the nearest neighbour(s). For a single neighbour:
predicted_class = species(n)
Where n is the index of the nearest neighbour as found by [n, d] = knnsearch(x, y).
sum(strcmp(predicted_class, true_class))
% result: 1
Which is indeed correct when you use only one neighbor.

Matlab : decision tree shows invalid output values

I'm making a decision tree using the classregtree(X,Y) function. I'm passing X as a matrix of size 70X9 (70 data objects, each having 9 attributes), and Y as a 70X1 matrix. Each one of my Y values is either 2 or 4. However, in the decision tree formed, it gives values of 2.5 or 3.5 for some of the leaf nodes.
Any ideas why this might be caused?
You are using classregtree in regression mode (which is the default mode).
Change the mode to classification mode.
Here is an example using CLASSREGTREE for classification:
%# load dataset
load fisheriris
%# split training/testing
cv = cvpartition(species, 'holdout',1/3);
trainIdx = cv.training;
testIdx = cv.test;
%# train
t = classregtree(meas(trainIdx,:), species(trainIdx), 'method','classification', ...
'names',{'SL' 'SW' 'PL' 'PW'});
%# predict
pred = t.eval(meas(testIdx,:));
%# evaluate
cm = confusionmat(species(testIdx),pred)
acc = sum(diag(cm))./sum(testIdx)
The output (confusion matrix and accuracy):
cm =
17 0 0
0 13 3
0 2 15
acc =
0.9
Now if your target class is encoded as numbers, the returned prediction will still be cell array of strings, so you have to convert them back to numbers:
%# load dataset
load fisheriris
[species,GN] = grp2idx(species);
%# ...
%# evaluate
cm = confusionmat(species(testIdx),str2double(pred))
acc = sum(diag(cm))./sum(testIdx)
Note that classification will always return strings, so I think you might have mistakenly used the method=regression option, which performs regression (numeric target) not classification (discrete target)

MATLAB - usage of knnclassify

When doing:
load training.mat
training = G
load testing.mat
test = G
and then:
>> knnclassify(test.Inp, training.Inp, training.Ltr)
??? Error using ==> knnclassify at 91
The length of GROUP must equal the number of rows in TRAINING.
Since:
>> size(training.Inp)
ans =
40 40 2016
And:
>> length(training.Ltr)
ans =
2016
How can I give the second parameter of knnclassify (TRAINING) the training.inp 3-D matrix so that the number of rows will be 2016 (the third dimension)?
Assuming that your 3D data is interpreted as 40-by-40 matrix of features for each of the 2016 instances (third dimension), we will have to re-arrange it as a matrix of size 2016-by-1600 (rows are samples, columns are dimensions):
%# random data instead of the `load data.mat`
testing = rand(40,40,200);
training = rand(40,40,2016);
labels = randi(3, [2016 1]); %# a class label for each training instance
%# (out of 3 possible classes)
%# arrange data as a matrix whose rows are the instances,
%# and columns are the features
training = reshape(training, [40*40 2016])';
testing = reshape(testing, [40*40 200])';
%# k-nearest neighbor classification
prediction = knnclassify(testing, training, labels);