When doing:
load training.mat
training = G
load testing.mat
test = G
and then:
>> knnclassify(test.Inp, training.Inp, training.Ltr)
??? Error using ==> knnclassify at 91
The length of GROUP must equal the number of rows in TRAINING.
Since:
>> size(training.Inp)
ans =
40 40 2016
And:
>> length(training.Ltr)
ans =
2016
How can I give the second parameter of knnclassify (TRAINING) the training.inp 3-D matrix so that the number of rows will be 2016 (the third dimension)?
Assuming that your 3D data is interpreted as 40-by-40 matrix of features for each of the 2016 instances (third dimension), we will have to re-arrange it as a matrix of size 2016-by-1600 (rows are samples, columns are dimensions):
%# random data instead of the `load data.mat`
testing = rand(40,40,200);
training = rand(40,40,2016);
labels = randi(3, [2016 1]); %# a class label for each training instance
%# (out of 3 possible classes)
%# arrange data as a matrix whose rows are the instances,
%# and columns are the features
training = reshape(training, [40*40 2016])';
testing = reshape(testing, [40*40 200])';
%# k-nearest neighbor classification
prediction = knnclassify(testing, training, labels);
Related
I have an issue with my model accuracy calculation. I used the code below:
y_train = [ 1 1 1 4 4 3 3 5 5 5 ]; % true labels for x_train
%x_test : has no true labels.
predictedLabel=[ 1 2 3 4 5 ]; % predicted labels for x_test
group=y_train ; % 10
grouphat=predictedLabel; % for test 5 test data
C=confusionmat(group,grouphat);
Accuracy = sum ( diag (C)) / sum (C (:)) ×100;
but I get the error:
Error using confusionmat (line 75)
G and GHAT need to have same number of rows
Do I get this error since the test data is more or less than the train? There is no true label for test data (semi supervised learning).
Your training labels and predicted labels are based on different inputs, so it doesn't make sense to compare them in a confusion matrix. From the confusionmat docs:
returns the confusion matrix C determined by the known and predicted groups
i.e. the known and predicted results for the same data.
Take this partly pseudo-code example, see the comments for details
% split your input data
trainData = data(1:100, :); % Training data
testData = data(101:120, :); % Testing data (mutually exclusive from training)
% Do some training (pseudo-code, not valid MATLAB)
% ** Let's assume that the labels are in column 1 **
model = train( trainData(:,1), trainData(:,2:end) );
% Test your model on the input data, excluding the actual labels in column 1
predictedLabels = model( testData(:,2:end) );
% Get the actual labels from column 1
actualLabels = testData(:,1);
% Note that size(predictedLabels) == size(actualLabels)
% Now we can do a confusion matrix
C = confusionmat( actualLabels, predictedLabels )
In matlab, I commonly have a data matrix of size NxMxLxK which I wish to index along specific dimension (e.g. the forth) using an indices matrix of size NxMxL with values 1..K (assume the are all in this range):
>>> size(Data)
ans =
7 22 128 40
>>> size(Ind)
ans =
7 22 128
I would like to have code without loops which achieve the following effect:
Result(i,j,k) = Data(i,j,k,Ind(i,j,k))
for all values of i,j,k in range.
You can vectorize your matrices and use sub2ind :
% create indices that running on all of the options for the first three dimensions:
A = kron([1:7],ones(1,22*128));
B = repmat(kron([1:22],ones(1,128)),1,7);
C = repmat([1:128],1,7*22);
Result_vec = Data(sub2ind(size(Data),A,B,C,Ind(:)'));
Result = reshape(Result_vec,7,22,128);
I have extracted HOG features for male and female pictures, now, I'm trying to use the Leave-one-out-method to classify my data.
Due the standard way to write it in Matlab is:
[Train, Test] = crossvalind('LeaveMOut', N, M);
What I should write instead of N and M?
Also, should I write above code statement inside or outside a loop?
this is my code, where I have training folder for Male (80 images) and female (80 images), and another one for testing (10 random images).
for i = 1:10
[Train, Test] = crossvalind('LeaveMOut', N, 1);
SVMStruct = svmtrain(Training_Set (Train), train_label (Train));
Gender = svmclassify(SVMStruct, Test_Set_MF (Test));
end
Notes:
Training_Set: an array consist of HOG features of training folder images.
Test_Set_MF: an array consist of HOG features of test folder images.
N: total number of images in training folder.
SVM should detect which images are male and which are female.
I will focus on how to use crossvalind for the leave-one-out-method.
I assume you want to select random sets inside a loop. N is the length of your data vector. M is the number of randomly selected observations in Test. Respectively M is the number of observations left out in Train. This means you have to set N to the length of your training-set. With M you can specify how many values you want in your Test-output, respectively you want to left out in your Train-output.
Here is an example, selecting M=2 observations out of the dataset.
dataset = [1 2 3 4 5 6 7 8 9 10];
N = length(dataset);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, M);
% do whatever you want with Train and Test
dataset(Test) % display the test-entries
end
This outputs: (this is generated randomly, so you won't have the same result)
ans =
1 9
ans =
6 8
ans =
7 10
ans =
4 5
ans =
4 7
As you have it in your code according to this post, you need to adjust it for a matrix of features:
Training_Set = rand(10,3); % 10 samples with 3 features each
N = size(Training_Set,1);
M = 2;
for i = 1:5
[Train, Test] = crossvalind('LeaveMOut', N, 2);
Training_Set(Train,:) % displays the data to train
end
I am training a one vs all svm classifier. I used a 200 by 459 matrix to train the classifier using VLFeat svm classifier. (http://www.vlfeat.org/matlab/vl_svmtrain.html)
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
where train_image_feats' is a 200 by 459 matrix, and tmp' is the label matrix which is 1 by 459 vector.
The above command trains the svm with no problem, but then to classify the scores obtained on the test matrix I get an error. The test matrix is obviously not of the same size as that of the training matrix.
scores(i, :) = W'*test_image_feats' + B;
Where test_image_feats' is a 200 by 90 matrix. scores is a 9 by 459 matrix. 9 Because there are 9 categories(labels) to classify and 459 are the number of training images.
The above command gives the error:
Subscripted assignment dimension mismatch.
Error in svm_classify (line 56) scores(i, :) = W'*test_image_feats'
+ B;
Edit: Full code added..
categories = unique(train_labels);
num_categories = length(categories);
scores = zeros([num_categories size(train_labels, 1)]); %train_labels is 459 by 1 size
for i=1:num_categories %there are 9 categories
tmp = strcmp(train_labels, categories{i});
tmp = tmp - (1-tmp);
[W B] = vl_svmtrain(train_image_feats', tmp', .00001);
scores(i, :) = W'*test_image_feats' + B;
end
predicted_categories = cell(size(train_labels));
parfor i=1:size(test_image_feats,1)
image_scores = scores(:, i);
label_index = find(image_scores==max(image_scores));
predicted_categories{i}=categories(label_index);
end
Conceptually you are training a model with 459 training samples to predict the scores of 90 test samples.
scores = zeros([num_categories size(train_labels, 1)]);
isn't right as it will be the size of the training set. In fact you don't have to care at all about the size of the training set, you could train the model with 20 or 20000 images the prediction step shouldn't be any different.
scores have to be defined with the test case in mind
scores = zeros([num_categories size(test_labels, 1)]);
When you used 459 for both it only worked because size(test_labels, 1) was equal to size(train_labels, 1)
The problem is not with your right hand side of the assignment, but with score(i,:): you are trying to assign a 9-by-90 size matrix into a single row of score - this simply won't fit.
I'm making a decision tree using the classregtree(X,Y) function. I'm passing X as a matrix of size 70X9 (70 data objects, each having 9 attributes), and Y as a 70X1 matrix. Each one of my Y values is either 2 or 4. However, in the decision tree formed, it gives values of 2.5 or 3.5 for some of the leaf nodes.
Any ideas why this might be caused?
You are using classregtree in regression mode (which is the default mode).
Change the mode to classification mode.
Here is an example using CLASSREGTREE for classification:
%# load dataset
load fisheriris
%# split training/testing
cv = cvpartition(species, 'holdout',1/3);
trainIdx = cv.training;
testIdx = cv.test;
%# train
t = classregtree(meas(trainIdx,:), species(trainIdx), 'method','classification', ...
'names',{'SL' 'SW' 'PL' 'PW'});
%# predict
pred = t.eval(meas(testIdx,:));
%# evaluate
cm = confusionmat(species(testIdx),pred)
acc = sum(diag(cm))./sum(testIdx)
The output (confusion matrix and accuracy):
cm =
17 0 0
0 13 3
0 2 15
acc =
0.9
Now if your target class is encoded as numbers, the returned prediction will still be cell array of strings, so you have to convert them back to numbers:
%# load dataset
load fisheriris
[species,GN] = grp2idx(species);
%# ...
%# evaluate
cm = confusionmat(species(testIdx),str2double(pred))
acc = sum(diag(cm))./sum(testIdx)
Note that classification will always return strings, so I think you might have mistakenly used the method=regression option, which performs regression (numeric target) not classification (discrete target)