Confusion matrix for RCNN Detector in Matlab - matlab

I tagged 300 images using the alexnet neural network I completed the training of the rcnn detector object. I can test the pictures one by one, containing 12 different people, 100 of which are not labeled.
testImage = imread('E:.....\....jpg');
[bboxes, score, label] = detect(detector, testImage, 'MiniBatchSize', 128)
T = 0.7;
idx = score >= T;
s = score(idx);
lbl = label(idx);
bbox = bboxes(idx, :);
outputImage = testImage;
for ii = 1 : size(bbox, 1)
annotation = sprintf('%s: (Confidence = %f)', lbl(ii), s(ii));
outputImage = insertObjectAnnotation(outputImage, 'rectangle', bbox(ii,:), annotation);
end
figure
imshow(outputImage)
But for 100 test pictures, each with 12 people (12 labels), I cannot plot confusion matrix that gives the success of the network, the correct number of predictions. There are many examples, which one should I use? For example, in an experiment I get an error like this.
Example
net = trainNetwork(XTrain,YTrain,layers,options);
[XTest,YTest] = trainingData;
YPredicted = classify(net,XTest);
plotconfusion(YTest,YPredicted)
Eror
[XTest,YTest] = detector;
Insufficient number of outputs from right hand side of equal sign to satisfy assignment.
I have 100 test pictures. There is a rcnn detector trained using Alexnet. How can I see the correct number of tagging of 12 people (12 labels) in 100 images with the confusion matrix? How can I create the success of the RCNN detector for 100 images and 12 different labels?

Related

Correct practice and approach for reporting the training and generalization performance

I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals?
The dataset can be found in the link: https://www.mathworks.com/matlabcentral/fileexchange/71468-simple-neural-networks-with-k-fold-cross-validation-manner
rng('default')
load iris.mat;
X = [f(1:100,:) l(1:100)];
numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);
split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test
istrainval = training(split1); % index for fitting
istest = test(split1); % indices for quality assessment
trainData = X(istrainval,:);
Xtest = X(istest,:);
Ytest = Y(istest);
numExamplesXtrainval = size(trainData,1);
indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);
hiddenLayerSize = 10;
% data format = rows = number of dim, column = number of examples
net = patternnet(hiddenLayerSize);
net = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs=50;
[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData'); %outputs predicted labels
train_predict = net(trainData');
performanceTrain = perform(net,Ytrain',train_predict)
lbl_train=grp2idx(Ytrain);
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);
[cmMatrixTrain]= confusionmat(lbl_train,Lbl_Yhat_Train )
accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set: Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])
[confTest] = confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )
%unknown test
test_predict = net(Xtest');
performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
test_lbl=grp2idx(Ytest);
Lbl_Yhat_Test = grp2idx(Yhat_test);
[cmMatrix_Test]= confusionmat(test_lbl,Lbl_Yhat_Test )
This is the output.
Problem1: There seems to be no prediction for the other class. Why?
Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?
performanceTrain =
2.2204e-16
cmMatrixTrain =
45 0
45 0
Training Set: Total Train Acccuracy by MLP = 50%
confTest =
9 0
5 0
cmMatrix_Test =
5 0
5 0
There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:
figure;
plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')
cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set
net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;
So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.

create training and testing set with ground truth for Hyper spectral satellite imagery

I am trying to create Training and Testing set out of my ground truth(observation) data which are presented in a tif (raster) format.
Actually, I have a hyperspectral image (Satellite image) which has 200 dimensions(channels/bands) along with the corresponding label(17 class) which are stored in another image. Now, my goal is to implement a classification algorithm and then check the accuracy with the testing dataset.
My problem is, that I do not know how can I describe to my algorithm that which pixel belongs to which class and then split them to taring and testing set.
I have provided a face idea of my goal which is as follows:
But I do not want to do this since I have 145 * 145 pixels dim, so it's not easy to define the location of these pixels and manually assign to their corresponding class.
note that the following example is for 3D image and I have 200D image and I have the labels (ground truth) so I do not need to specify them like the following code but I do want to assign them to their pixels member.
% Assigning pixel(by their location)to different groups.
tpix=[1309,640 ,1;... % Group 1
1218,755 ,1;...
1351,1409,2;... % Group 2
673 ,394 ,2;...
285 ,1762,3;... % Group 3
177 ,1542,3;...
538 ,1754,4;... % Group 4
432 ,1811,4;...
1417,2010,5;... % Group 5
163 ,1733,5;...
652 ,677 ,6;... % Group 6
864 ,1032,6];
row=tpix(:,1); % y-value
col=tpix(:,2); % x-value
group=tpix(:,3); % group number
ngroup=max(group);
% create trainingset
train=[];
for i=1:length(group)
train=[train; r(row(i),col(i)), g(row(i),col(i)), b(row(i),col(i))];
end %for
Do I understand this right? At the seconlast line the train variable gets the values it has until now + the pixels in red, green and blue? Like, you want them to be displayed only in red,green and blue? Only certain ones or all of them? I could imagine that we define an image matrix and then place the values in the images red, green and blue layers. Would that help? I'd make you the code if this is you issue :)
Edit: Solution
%download the .mats from the website and put them in folder of script
load 'Indian_pines_corrected.mat';
load 'Indian_pines_gt.mat';
ipc = indian_pines_corrected;
gt = indian_pines_gt;
%initiating cell
train = cell(16,1);
%loop to search class number of the x and y pixel coordinates
for c = 1:16
for i = 1:145
for j = 1:145
% if the classnumber is equal to the number in the gt pixel,
% then place the pixel from ipc(x,y,:) it in the train{classnumber}(x,y,:)
if gt(i,j) == c
train{c}(i,j,:) = ipc(i,j,:);
end %if
end %for j
end %for i
end %for c
Now you get the train cell that has a matrix in each cell. Each cell is one class and has only the pixels inside that you want. You can check for yourself if the classes correspond to the shape.
Eventually, I could solve my problem. The following code reshapes the matrix(Raster) to vector and then I index the Ground Truth data to find the corresponding pixel's location in Hyperspectral image.
Note that I am looking for an efficient way to construct Training and Testing set.
GT = indian_pines_gt;
data = indian_pines_corrected;
data_vec=reshape(data, 145*145,200);
GT_vec = reshape(GT,145*145,1);
[GT_vec_sort,idx] = sort(GT_vec);
%INDEXING.
index = find(and(GT_vec_sort>0,GT_vec_sort<=16));
classes_num = GT_vec_sort(index);
%length(index)
for k = 1: length(index)
classes(k,:) = data_vec(idx(index(k)),:);
end
figure(1)
plot(GT_vec_sort)
New.
I have done the following for creating Training and Testing set for #Hyperspectral images(Pine dataset). No need to use for loop
clear all
load('Indian_pines_corrected.mat');
load Indian_pines_gt.mat;
GT = indian_pines_gt;
data = indian_pines_corrected;
%Convert image from raster to vector.
data_vec = reshape(data, 145*145, 200);
%Provide location of the desired classes.
GT_loc = find(and(GT>0,GT<=16));
GT_class = GT(GT_loc)
data_value = data_vec(GT_loc,:)
% explanatories plus Respond variable.
%[200(variable/channel)+1(labels)= 201])
dat = [data_value, GT_class];
% create random Test and Training set.
[m,n] = size(dat);
P = 0.70 ;
idx = randperm(m);
Train = dat(idx(1:round(P*m)),:);
Test = dat(idx(round(P*m)+1:end),:);
X_train = Train(:,1:200); y_train = Train(:, 201);
X_test = Test(:,1:200); y_test = Test(:, 201);

Deploy and use LeNet on Matlab using MatCaffe

I am having issue with MatCaffe. I trained LeNet using my own dataset (2 classification, 0 or 1) in python successfully and trying to deploy it on Matlab now. Net architecture is from caffe/examples/mnist/lenet.prototxt. All the input images I fed into the net always return 1. (i tried using both positive and negative images from training).
Below is my code:
deployNet = 'lenet_deploy.prototxt';
caffeModel = 'weight.caffemodel';
caffe.set_mode_cpu();
net = caffe.Net(deployNet, caffeModel, 'test');
net.blobs('data').reshape([28 28 1 1]);
net.reshape();
patch_data = imread('cropped.jpg'); % already in greyscale
patch_data = imresize(patch_data, [28 28],'bilinear');
imshow(patch_data)
input_data = {patch_data};
scores = net.forward(input_data);
highest = max(scores{1});
disp(i);
disp(highest);
The highest always return 1 even for negative image. I tried deploying it on python and it works great. I am guessing issue with the way I pre-process the input.
Found out the issue. I forgotten to multiply the image with the training scale and transpose the width and height since Matlab is 1-indexed and column-major, the usual 4 blob dimensions in Matlab are [width, height, channels, num], and width is the fastest dimension. So just add in 2 more line of codes:
deployNet = 'lenet_deploy.prototxt';
caffeModel = 'weight.caffemodel';
caffe.set_mode_cpu();
net = caffe.Net(deployNet, caffeModel, 'test');
net.blobs('data').reshape([28 28 1 1]);
net.reshape();
patch = imread('cropped.jpg'); % already in greyscale
patch = single(patch) * 0.00390625; % multiply with scale
patch = permute(patch, [2,1,3]); %permute width and height
input_data = {patch};
scores = net.forward(input_data);
highest = scores{1};

Bag of words not correctly labeling responses

I am trying to implement Bag of Words in opencv and has come with the implementation below. I am using Caltech 101 database. However, since its my first time and not being familiar, I have planned to used two image sets from the database, the chair image set and the soccer ball image set. I have coded for the svm using this.
Everything went allright, except when I call classifier.predict(descriptor) , I do not get the label vale as intended. I always get a0 instead of '1', irrespective of my test image. The number of images in the chair dataset is 10 and in the soccer ball dataset is 10. I labelled chair as 0 and soccer ball as 1 . The links represent the samples of each categories, the top 10 is of chairs, the bottom 10 is of soccer balls
function hello
clear all; close all; clc;
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');
links = {
'http://i.imgur.com/48nMezh.jpg'
'http://i.imgur.com/RrZ1i52.jpg'
'http://i.imgur.com/ZI0N3vr.jpg'
'http://i.imgur.com/b6lY0bJ.jpg'
'http://i.imgur.com/Vs4TYPm.jpg'
'http://i.imgur.com/GtcwRWY.jpg'
'http://i.imgur.com/BGW1rqS.jpg'
'http://i.imgur.com/jI9UFn8.jpg'
'http://i.imgur.com/W1afQ2O.jpg'
'http://i.imgur.com/PyX3adM.jpg'
'http://i.imgur.com/U2g4kW5.jpg'
'http://i.imgur.com/M8ZMBJ4.jpg'
'http://i.imgur.com/CinqIWI.jpg'
'http://i.imgur.com/QtgsblB.jpg'
'http://i.imgur.com/SZX13Im.jpg'
'http://i.imgur.com/7zVErXU.jpg'
'http://i.imgur.com/uUMGw9i.jpg'
'http://i.imgur.com/qYSkqEg.jpg'
'http://i.imgur.com/sAj3pib.jpg'
'http://i.imgur.com/DMPsKfo.jpg'
};
N = numel(links);
trainer = cv.BOWKMeansTrainer(100);
train = struct('val',repmat({' '},N,1),'img',cell(N,1), 'pts',cell(N,1), 'feat',cell(N,1));
for i=1:N
train(i).val = links{i};
train(i).img = imread(links{i});
if ndims(train(i).img > 2)
train(i).img = rgb2gray(train(i).img);
end;
train(i).pts = detector.detect(train(i).img);
train(i).feat = extractor.compute(train(i).img,train(i).pts);
end;
for i=1:N
trainer.add(train(i).feat);
end;
dictionary = trainer.cluster();
extractor = cv.BOWImgDescriptorExtractor('SURF','BruteForce');
extractor.setVocabulary(dictionary);
for i=1:N
desc(i,:) = extractor.compute(train(i).img,train(i).pts);
end;
a = zeros(1,10)';
b = ones(1,10)';
labels = [a;b];
classifier = cv.SVM;
classifier.train(desc,labels);
test_im =rgb2gray(imread('D:\ball1.jpg'));
test_pts = detector.detect(test_im);
test_feat = extractor.compute(test_im,test_pts);
val = classifier.predict(test_feat);
disp('Value is: ')
disp(val)
end
These are my test samples:
Soccer Ball
(source: timeslive.co.za)
Chair
Searching through this site I think that my algorithm is okay, even though I am not quite confident about it. If anybody can help me in finding the bug, it will be appreciable.
Following Amro's code , this was my result:
Distribution of classes:
Value Count Percent
1 62 49.21%
2 64 50.79%
Number of training instances = 61
Number of testing instances = 65
Number of keypoints detected = 38845
Codebook size = 100
SVM model parameters:
svm_type: 'C_SVC'
kernel_type: 'RBF'
degree: 0
gamma: 0.5063
coef0: 0
C: 62.5000
nu: 0
p: 0
class_weights: 0
term_crit: [1x1 struct]
Confusion matrix:
ans =
29 1
1 34
Accuracy = 96.92 %
Your logic looks fine to me.
Now I guess you'll have to tweak the various parameters if you want to improve the classification accuracy. This includes the clustering algorithm parameters (such as the vocabulary size, clusters initialization, termination criteria, etc..), the SVM parameters (kernel type, the C coefficient, ..), the local features algorithm used (SIFT, SURF, ..).
Ideally, whenever you want to perform parameter selection, you ought to use cross-validation. Some methods already have such mechanism embedded (CvSVM::train_auto for instance), but for the most part you'll have to do this manually...
Finally you should follow general machine learning guidelines; see the whole bias-variance tradeoff dilemma. The online Coursera ML class discusses this topic in detail in week 6, and explains how to perform error analysis and use learning curves to decide what to try next (do we need to add more instances, increase model complexity, and so on..).
With that said, I wrote my own version of the code. You might wanna compare it with your code:
% dataset of images
% I previously saved them as: chair1.jpg, ..., ball1.jpg, ball2.jpg, ...
d = [
dir(fullfile('images','chair*.jpg')) ;
dir(fullfile('images','ball*.jpg'))
];
% local-features algorithm used
detector = cv.FeatureDetector('SURF');
extractor = cv.DescriptorExtractor('SURF');
% extract local features from images
t = struct();
for i=1:numel(d)
% load image as grayscale
img = imread(fullfile('images', d(i).name));
if ~ismatrix(img), img = rgb2gray(img); end
% extract local features
pts = detector.detect(img);
feat = extractor.compute(img, pts);
% store along with class label
t(i).img = img;
t(i).class = find(strncmp(d(i).name,{'chair','ball'},4));
t(i).pts = pts;
t(i).feat = feat;
end
% split into training/testing sets
% (a better way would be to use cvpartition from Statistics toolbox)
disp('Distribution of classes:')
tabulate([t.class])
tTrain = t([1:7 11:17]);
tTest = t([8:10 18:20]);
fprintf('Number of training instances = %d\n', numel(tTrain));
fprintf('Number of testing instances = %d\n', numel(tTest));
% build visual vocabulary (by clustering training descriptors)
K = 100;
bowTrainer = cv.BOWKMeansTrainer(K, 'Attempts',5, 'Initialization','PP');
clust = bowTrainer.cluster(vertcat(tTrain.feat));
fprintf('Number of keypoints detected = %d\n', numel([tTrain.pts]));
fprintf('Codebook size = %d\n', K);
% compute histograms of visual words for each training image
bowExtractor = cv.BOWImgDescriptorExtractor('SURF', 'BruteForce');
bowExtractor.setVocabulary(clust);
M = zeros(numel(tTrain), K);
for i=1:numel(tTrain)
M(i,:) = bowExtractor.compute(tTrain(i).img, tTrain(i).pts);
end
labels = vertcat(tTrain.class);
% train an SVM model (perform paramter selection using cross-validation)
svm = cv.SVM();
svm.train_auto(M, labels, 'SvmType','C_SVC', 'KernelType','RBF');
disp('SVM model parameters:'); disp(svm.Params)
% evaluate classifier using testing images
actual = vertcat(tTest.class);
pred = zeros(size(actual));
for i=1:numel(tTest)
descs = bowExtractor.compute(tTest(i).img, tTest(i).pts);
pred(i) = svm.predict(descs);
end
% report performance
disp('Confusion matrix:')
confusionmat(actual, pred)
fprintf('Accuracy = %.2f %%\n', 100*nnz(pred==actual)./numel(pred));
Here are the output:
Distribution of classes:
Value Count Percent
1 10 50.00%
2 10 50.00%
Number of training instances = 14
Number of testing instances = 6
Number of keypoints detected = 6300
Codebook size = 100
SVM model parameters:
svm_type: 'C_SVC'
kernel_type: 'RBF'
degree: 0
gamma: 0.5063
coef0: 0
C: 312.5000
nu: 0
p: 0
class_weights: []
term_crit: [1x1 struct]
Confusion matrix:
ans =
3 0
1 2
Accuracy = 83.33 %
So the classifier correctly labels 5 out of 6 images from the test set, which is not bad for a start :) Obviously you'll get different results each time you run the code due to the inherent randomness of the clustering step.
What is the number of images you are using to build your dictionary i.e. what is N? From your code, it seems that you are only using a 10 images (those listed in links). I hope this list is truncated down for this post else that would be too few. Typically you need in the order of 1000 or much more images to build the dictionary and the images need not be restricted to only these 2 classes that you are classifying. Otherwise, with only 10 images and 100 clusters your dictionary is likely to be messed up.
Also, you might want to use SIFT as a first choice as it tends to perform better than the other descriptors.
Lastly, you can also debug by checking the detected keypoints. You can get OpenCV to draw the keypoints. Sometimes your keypoint detector parameters are not set properly, resulting in too few keypoints getting detected, which in turn gives poor feature vectors.
To understand more about the BOW algorithm, you can take a look at these posts here and here. The second post has a link to a free pdf for an O'Reilley book on computer vision using python. The BOW model (and other useful stuff) is described in more details inside that book.
Hope this helps.

using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.
What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.
This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.
How can i remove the dependency between number of training images and features, so i can test with any number of images?
I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].
You seem to already have figured out the problem... According to the README file included in the MATLAB package:
To use precomputed kernel, you must include sample serial number as
the first column of the training and testing data.
Let me illustrate with an example:
%# read dataset
[dataClass, data] = libsvmread('./heart_scale');
%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);
%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = #(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];
%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);
%# confusion matrix
C = confusionmat(testClass,predClass)
The output:
*
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)
C =
65 5
12 38