Confusion matrix error with less test data than training data - matlab

I have an issue with my model accuracy calculation. I used the code below:
y_train = [ 1 1 1 4 4 3 3 5 5 5 ]; % true labels for x_train
%x_test : has no true labels.
predictedLabel=[ 1 2 3 4 5 ]; % predicted labels for x_test
group=y_train ; % 10
grouphat=predictedLabel; % for test 5 test data
C=confusionmat(group,grouphat);
Accuracy = sum ( diag (C)) / sum (C (:)) ×100;
but I get the error:
Error using confusionmat (line 75)
G and GHAT need to have same number of rows
Do I get this error since the test data is more or less than the train? There is no true label for test data (semi supervised learning).

Your training labels and predicted labels are based on different inputs, so it doesn't make sense to compare them in a confusion matrix. From the confusionmat docs:
returns the confusion matrix C determined by the known and predicted groups
i.e. the known and predicted results for the same data.
Take this partly pseudo-code example, see the comments for details
% split your input data
trainData = data(1:100, :); % Training data
testData = data(101:120, :); % Testing data (mutually exclusive from training)
% Do some training (pseudo-code, not valid MATLAB)
% ** Let's assume that the labels are in column 1 **
model = train( trainData(:,1), trainData(:,2:end) );
% Test your model on the input data, excluding the actual labels in column 1
predictedLabels = model( testData(:,2:end) );
% Get the actual labels from column 1
actualLabels = testData(:,1);
% Note that size(predictedLabels) == size(actualLabels)
% Now we can do a confusion matrix
C = confusionmat( actualLabels, predictedLabels )

Related

Leave one out crossvalind in Matlab

I have extracted HOG features for male and female pictures, now, I'm trying to use the Leave-one-out-method to classify my data.
Due the standard way to write it in Matlab is:
[Train, Test] = crossvalind('LeaveMOut', N, M);
What I should write instead of N and M?
Also, should I write above code statement inside or outside a loop?
this is my code, where I have training folder for Male (80 images) and female (80 images), and another one for testing (10 random images).
for i = 1:10
[Train, Test] = crossvalind('LeaveMOut', N, 1);
SVMStruct = svmtrain(Training_Set (Train), train_label (Train));
Gender = svmclassify(SVMStruct, Test_Set_MF (Test));
end
Notes:
Training_Set: an array consist of HOG features of training folder images.
Test_Set_MF: an array consist of HOG features of test folder images.
N: total number of images in training folder.
SVM should detect which images are male and which are female.
I will focus on how to use crossvalind for the leave-one-out-method.
I assume you want to select random sets inside a loop. N is the length of your data vector. M is the number of randomly selected observations in Test. Respectively M is the number of observations left out in Train. This means you have to set N to the length of your training-set. With M you can specify how many values you want in your Test-output, respectively you want to left out in your Train-output.
Here is an example, selecting M=2 observations out of the dataset.
dataset = [1 2 3 4 5 6 7 8 9 10];
N = length(dataset);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, M);
% do whatever you want with Train and Test
dataset(Test) % display the test-entries
end
This outputs: (this is generated randomly, so you won't have the same result)
ans =
1 9
ans =
6 8
ans =
7 10
ans =
4 5
ans =
4 7
As you have it in your code according to this post, you need to adjust it for a matrix of features:
Training_Set = rand(10,3); % 10 samples with 3 features each
N = size(Training_Set,1);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, 2);
Training_Set(Train,:) % displays the data to train
end

Find the classification rate of testing data

I need to use KNN search to classify the testing data and find the classification rate.
Below is the matlab code:
for example:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
load fisheriris
x = meas(:,3:4); % x =all training data
y = [5 1.45;6 2;2.75 .75]; % y =3 testing data
[n,d] = knnsearch(x,y,'k',10); % find the 10 nearest neighbors to three testing data
for b=1:3
tabulate(species(n(b,:)))
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
The result was display in Command window:
tabulate(species(n(1,:)))
Value Count Percent
virginica 2 20.00%
versicolor 8 80.00%
tabulate(species(n(2,:)))
Value Count Percent
virginica 10 100.00%
tabulate(species(n(3,:)))
Value Count Percent
versicolor 7 70.00%
setosa 3 30.00%
If the testing points are 'Versicolor',the result of first and third testing point are classify correctly and second testing point is wrong one.So the classification rate is 2/3 x100%=66.7%.
Is there any idea to modify the matlab code to find the classification rate automatically and save the result into the Workspace?
In general you can find the number of correct predictions by using
sum(predicted_class == true_class) % For numerical data
sum(strcmp(predicted_class, true_class)) % For cellstrings
Or as a percentage
100 * sum(predicted_class == true_class) / length(predicted_class)
In the case of fisheriris the true class would be species. For your constructed data it would be
true_classes = [cellstr('versicolor'); cellstr('versicolor'); cellstr('versicolor')]
In the case of nearest neighbours, the true classes would be the class of the nearest neighbour(s). For a single neighbour:
predicted_class = species(n)
Where n is the index of the nearest neighbour as found by [n, d] = knnsearch(x, y).
sum(strcmp(predicted_class, true_class))
% result: 1
Which is indeed correct when you use only one neighbor.

Matrix dimensions must agree min max normalization?

Hi I get the error stated below, im trying to normalize between 0 and 1. The error I get is this:
columns =
6
??? Error using ==> minus
Matrix dimensions must agree.
Error in ==> Kmeans at 54
data = ((data-minData)./(maxData));
Not sure what ive did wrong? Full code below:
%% dimensionality reduction
columns = 6
[U,S,V]=svds(fulldata,columns);
%% randomly select dataset
rows = 1000;
columns = 6;
%# pick random rows
indX = randperm( size(fulldata,1) );
indX = indX(1:rows);
%# pick random columns
indY = randperm( size(fulldata,2) );
indY = indY(1:columns);
%# filter data
data = U(indX,indY);
%% apply normalization method to every cell
maxData = max(data);
minData = min(data);
data = ((data-minData)./(maxData));
The dataset is 1000x6.
From the Matlab documentation on min:
If A is a matrix, min(A) treats the columns of A as vectors, returning a row vector containing the minimum element from each column.
If you want to find the global minimum of a matrix, use either of the following forms:
min(min(A))
min(A(:))

Matlab : decision tree shows invalid output values

I'm making a decision tree using the classregtree(X,Y) function. I'm passing X as a matrix of size 70X9 (70 data objects, each having 9 attributes), and Y as a 70X1 matrix. Each one of my Y values is either 2 or 4. However, in the decision tree formed, it gives values of 2.5 or 3.5 for some of the leaf nodes.
Any ideas why this might be caused?
You are using classregtree in regression mode (which is the default mode).
Change the mode to classification mode.
Here is an example using CLASSREGTREE for classification:
%# load dataset
load fisheriris
%# split training/testing
cv = cvpartition(species, 'holdout',1/3);
trainIdx = cv.training;
testIdx = cv.test;
%# train
t = classregtree(meas(trainIdx,:), species(trainIdx), 'method','classification', ...
'names',{'SL' 'SW' 'PL' 'PW'});
%# predict
pred = t.eval(meas(testIdx,:));
%# evaluate
cm = confusionmat(species(testIdx),pred)
acc = sum(diag(cm))./sum(testIdx)
The output (confusion matrix and accuracy):
cm =
17 0 0
0 13 3
0 2 15
acc =
0.9
Now if your target class is encoded as numbers, the returned prediction will still be cell array of strings, so you have to convert them back to numbers:
%# load dataset
load fisheriris
[species,GN] = grp2idx(species);
%# ...
%# evaluate
cm = confusionmat(species(testIdx),str2double(pred))
acc = sum(diag(cm))./sum(testIdx)
Note that classification will always return strings, so I think you might have mistakenly used the method=regression option, which performs regression (numeric target) not classification (discrete target)

MATLAB - usage of knnclassify

When doing:
load training.mat
training = G
load testing.mat
test = G
and then:
>> knnclassify(test.Inp, training.Inp, training.Ltr)
??? Error using ==> knnclassify at 91
The length of GROUP must equal the number of rows in TRAINING.
Since:
>> size(training.Inp)
ans =
40 40 2016
And:
>> length(training.Ltr)
ans =
2016
How can I give the second parameter of knnclassify (TRAINING) the training.inp 3-D matrix so that the number of rows will be 2016 (the third dimension)?
Assuming that your 3D data is interpreted as 40-by-40 matrix of features for each of the 2016 instances (third dimension), we will have to re-arrange it as a matrix of size 2016-by-1600 (rows are samples, columns are dimensions):
%# random data instead of the `load data.mat`
testing = rand(40,40,200);
training = rand(40,40,2016);
labels = randi(3, [2016 1]); %# a class label for each training instance
%# (out of 3 possible classes)
%# arrange data as a matrix whose rows are the instances,
%# and columns are the features
training = reshape(training, [40*40 2016])';
testing = reshape(testing, [40*40 200])';
%# k-nearest neighbor classification
prediction = knnclassify(testing, training, labels);