Correct practice and approach for reporting the training and generalization performance - matlab

I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals?
The dataset can be found in the link: https://www.mathworks.com/matlabcentral/fileexchange/71468-simple-neural-networks-with-k-fold-cross-validation-manner
rng('default')
load iris.mat;
X = [f(1:100,:) l(1:100)];
numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);
split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test
istrainval = training(split1); % index for fitting
istest = test(split1); % indices for quality assessment
trainData = X(istrainval,:);
Xtest = X(istest,:);
Ytest = Y(istest);
numExamplesXtrainval = size(trainData,1);
indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);
hiddenLayerSize = 10;
% data format = rows = number of dim, column = number of examples
net = patternnet(hiddenLayerSize);
net = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs=50;
[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData'); %outputs predicted labels
train_predict = net(trainData');
performanceTrain = perform(net,Ytrain',train_predict)
lbl_train=grp2idx(Ytrain);
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);
[cmMatrixTrain]= confusionmat(lbl_train,Lbl_Yhat_Train )
accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set: Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])
[confTest] = confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )
%unknown test
test_predict = net(Xtest');
performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
test_lbl=grp2idx(Ytest);
Lbl_Yhat_Test = grp2idx(Yhat_test);
[cmMatrix_Test]= confusionmat(test_lbl,Lbl_Yhat_Test )
This is the output.
Problem1: There seems to be no prediction for the other class. Why?
Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?
performanceTrain =
2.2204e-16
cmMatrixTrain =
45 0
45 0
Training Set: Total Train Acccuracy by MLP = 50%
confTest =
9 0
5 0
cmMatrix_Test =
5 0
5 0

There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:
figure;
plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')
cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set
net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;
So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.

Related

Chance level accuracy for clearly separable data

I have written what I believe to be quite a simple SVM-classifier [SVM = Support Vector Machine]. "Testing" it with normally distributed data with different parameters, the classifier is returning me a 50% accuracy. What is wrong?
Here is the code, the results should be reproducible:
features1 = normrnd(1,5,[100,5]);
features2 = normrnd(50,5,[100,5]);
features = [features1;features2];
labels = [zeros(100,1);ones(100,1)];
%% SVM-Classification
nrFolds = 10; %number of folds of crossvalidation
kernel = 'linear'; % 'linear', 'rbf' or 'polynomial'
C = 1; % C is the 'boxconstraint' parameter.
cvFolds = crossvalind('Kfold', labels, nrFolds);
for i = 1:nrFolds % iterate through each fold
testIdx = (cvFolds == i); % indices test instances
trainIdx = ~testIdx; % indices training instances
% train the SVM
cl = fitcsvm(features(trainIdx,:), labels(trainIdx),'KernelFunction',kernel,'Standardize',true,...
'BoxConstraint',C,'ClassNames',[0,1]);
[label,scores] = predict(cl, features(testIdx,:));
eq = sum(labels(testIdx));
accuracy(i) = eq/numel(labels(testIdx));
end
crossValAcc = mean(accuracy)
You are not computing the accuracy correctly. You need to determine how many predictions match the original data. You are simply summing up the total number of 1s in the test set, not the actual number of correct predictions.
Therefore you must change your eq statement to this:
eq = sum(labels(testIdx) == label);
Recall that labels(testIdx) extracts the true label from your test set and label is the predicted results from your SVM model. This correctly generates a vector of 0/1 where 0 means that the prediction does not match the actual label from the test set and 1 means that they agree. Summing over each time they agree, or each time the vector is 1 is the way to compute the accuracy.

Neural network not training

I'm trying to set up a custom neural network, but when I train it, it doesn't train : the training process makes 0 iterations! I don't get any errors though, just 0 iterations, and I have no idea why. (The architecture might seem odd to you, it is supposed to be a custom PNN. But before we can even discuss if it makes sense or not, I would like to be able to train it...)
Here is the code
net = network;
net.trainFcn = 'trainlm';
net.performFcn = 'mse';
net.numInputs = 1;
net.numLayers = (2*nbclasses)+1; % (one pattern layer + one summation layer per class) + competition layer
net.inputConnect(1:nbclasses,:) = 1; % connects the input to all pattern layers
for i = 1:nbclasses % Connect the pattern layers to their corresponding summation layers
net.layerConnect(i+nbclasses,i) = 1;
net.layers{i}.size = size(tr_feature,1);
net.layers{i}.transferFcn = 'radbas';
end
for i = (nbclasses+1):(nbclasses*2) % Connect all summation layers to the competition layer
net.layers{i}.size = 1;
net.layerConnect(net.numLayers,i) = 1;
end
net.layers{net.numLayers}.transferFcn = 'compet';
net.outputConnect(1,end) = 1;
net.view;
[net, tr] = train(net,tr_feature',tr_true');
% tr_feature is a 800x2 data matrix, tr_true is the 800x1 corresponding labels
Any idea?
Thanks in advance!

Naive Bayse Classifier for Multiclass: Getting Same Error Rate

I have implemented the Naive Bayse Classifier for multiclass but problem is my error rate is same while I increase the training data set. I was debugging this over an over but wasn't able to figure why its happening. So I thought I ll post it here to find if I am doing anything wrong.
%Naive Bayse Classifier
%This function split data to 80:20 as data and test, then from 80
%We use incremental 5,10,15,20,30 as the test data to understand the error
%rate.
%Goal is to compare the plots in stanford paper
%http://ai.stanford.edu/~ang/papers/nips01-discriminativegenerative.pdf
function[tPercent] = naivebayes(file, iter, percent)
dm = load(file);
for i=1:iter
%Getting the index common to test and train data
idx = randperm(size(dm.data,1))
%Using same idx for data and labels
shuffledMatrix_data = dm.data(idx,:);
shuffledMatrix_label = dm.labels(idx,:);
percent_data_80 = round((0.8) * length(shuffledMatrix_data));
%Doing 80-20 split
train = shuffledMatrix_data(1:percent_data_80,:);
test = shuffledMatrix_data(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the label data from the 80:20 split
train_labels = shuffledMatrix_label(1:percent_data_80,:);
test_labels = shuffledMatrix_label(percent_data_80+1:length(shuffledMatrix_data),:);
%Getting the array of percents [5 10 15..]
percent_tracker = zeros(length(percent), 2);
for pRows = 1:length(percent)
percentOfRows = round((percent(pRows)/100) * length(train));
new_train = train(1:percentOfRows,:);
new_train_label = train_labels(1:percentOfRows);
%get unique labels in training
numClasses = size(unique(new_train_label),1);
classMean = zeros(numClasses,size(new_train,2));
classStd = zeros(numClasses, size(new_train,2));
priorClass = zeros(numClasses, size(2,1));
% Doing the K class mean and std with prior
for kclass=1:numClasses
classMean(kclass,:) = mean(new_train(new_train_label == kclass,:));
classStd(kclass, :) = std(new_train(new_train_label == kclass,:));
priorClass(kclass, :) = length(new_train(new_train_label == kclass))/length(new_train);
end
error = 0;
p = zeros(numClasses,1);
% Calculating the posterior for each test row for each k class
for testRow=1:length(test)
c=0; k=0;
for class=1:numClasses
temp_p = normpdf(test(testRow,:),classMean(class,:), classStd(class,:));
p(class, 1) = sum(log(temp_p)) + (log(priorClass(class)));
end
%Take the max of posterior
[c,k] = max(p(1,:));
if test_labels(testRow) ~= k
error = error + 1;
end
end
avgError = error/length(test);
percent_tracker(pRows,:) = [avgError percent(pRows)];
tPercent = percent_tracker;
plot(percent_tracker)
end
end
end
Here is the dimentionality of my data
x =
data: [768x8 double]
labels: [768x1 double]
I am using Pima data set from UCI
What are the results of your implementation of the training data itself? Does it fit it at all?
It's hard to be sure but there are couple things that I noticed:
It is important for every class to have training data. You can't really train a classifier to recognize a class if there was no training data.
If possible number of training examples shouldn't be skewed towards some of classes. For example if in 2-class classification number of training and cross validation examples for class 1 constitutes only 5% of the data then function that always returns class 2 will have error of 5%. Did you try checking precision and recall separately?
You're trying to fit normal distribution to each feature in a class and then use it for posterior probabilities. I'm not sure how it plays out in terms of smoothing. Could you try to re-implement it with simple counting and see if it gives any different results?
It also could be that features are highly redundant and bayes method overcounts probabilities.

using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.
What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.
This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.
How can i remove the dependency between number of training images and features, so i can test with any number of images?
I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].
You seem to already have figured out the problem... According to the README file included in the MATLAB package:
To use precomputed kernel, you must include sample serial number as
the first column of the training and testing data.
Let me illustrate with an example:
%# read dataset
[dataClass, data] = libsvmread('./heart_scale');
%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);
%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = #(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];
%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);
%# confusion matrix
C = confusionmat(testClass,predClass)
The output:
*
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)
C =
65 5
12 38

How to set output size in Matlab newff method

Summary:
I'm trying to do classification of some images depending on the angles between body parts.
I assume that human body consists of 10 parts(as rectangles) and find the center of each part and calculate the angle of each part by reference to torso.
And I have three action categories:Handwave-Walking-Running.
My goal is to find which test images fall into which action category.
Facts:
TrainSet:1057x10 feature set,1057 stands for number of image.
TestSet:821x10
I want my output to be 3x1 matrice each row showing the percentage of classification for action category.
row1:Handwave
row2:Walking
row3:Running
Code:
actionCat='H';
[train_data_hw train_label_hw] = tugrul_traindata(TrainData,actionCat);
[test_data_hw test_label_hw] = tugrul_testdata(TestData,actionCat);
actionCat='W';
[train_data_w train_label_w] = tugrul_traindata(TrainData,actionCat);
[test_data_w test_label_w] = tugrul_testdata(TestData,actionCat);
actionCat='R';
[train_data_r train_label_r] = tugrul_traindata(TrainData,actionCat);
[test_data_r test_label_r] = tugrul_testdata(TestData,actionCat);
Train=[train_data_hw;train_data_w;train_data_r];
Test=[test_data_hw;test_data_w;test_data_r];
Target=eye(3,1);
net=newff(minmax(Train),[10 3],{'logsig' 'logsig'},'trainscg');
net.trainParam.perf='sse';
net.trainParam.epochs=50;
net.trainParam.goal=1e-5;
net=train(net,Train);
trainSize=size(Train,1);
testSize=size(Test,1);
if(trainSize > testSize)
pend=-1*ones(trainSize-testSize,size(Test,2));
Test=[Test;pend];
end
x=sim(net,Test);
Question:
I'm using Matlab newff method.But my output is always an Nx10 matrice not 3x1.
My input set should be grouped as 3 classes but they are grouped to 10 classes.
Thanks
%% Load data : I generated some random data instead
Train = rand(1057,10);
Test = rand(821,10);
TrainLabels = randi([1 3], [1057 1]);
TestLabels = randi([1 3], [821 1]);
trainSize = size(Train,1);
testSize = size(Test,1);
%% prepare the input/output vectors (1-of-N output encoding)
input = Train'; %'matrix of size numFeatures-by-numImages
output = zeros(3,trainSize); % matrix of size numCategories-by-numImages
for i=1:trainSize
output(TrainLabels(i), i) = 1;
end
%% create net: one hidden layer with 10 nodes (output layer size is infered: 3)
net = newff(input, output, 10, {'logsig' 'logsig'}, 'trainscg');
net.trainParam.perf = 'sse';
net.trainParam.epochs = 50;
net.trainParam.goal = 1e-5;
view(net)
%% training
net = init(net); % initialize
[net,tr] = train(net, input, output); % train
%% performance (on Training data)
y = sim(net, input); % predict
%[err cm ind per] = confusion(output, y);
[maxVals predicted] = max(y); % predicted
cm = confusionmat(predicted, TrainLabels);
acc = sum(diag(cm))/sum(cm(:));
fprintf('Accuracy = %.2f%%\n', 100*acc);
fprintf('Confusion Matrix:\n');
disp(cm)
%% Testing (on Test data)
y = sim(net, Test');
Note how I converted from category label for each instance (1/2/3) to a 1-to-N encoding vector ([100]: 1, [010]: 2, [001]: 3)
Also note that the test set is currently not being used, since by default the input data is divided into train/test/validation. You could achieve your manual division by setting net.divideFcn to the divideind function, and setting the corresponding net.divideParam parameters.
I showed the testing on the same training data, but you could do the same for the Test data.