How to perform two group classification with deep neural network? (Matlab) - matlab

I'm new in machine learning (and to stackoverflow as well) and i want to make some classification tasks. I performed two group classifications on my data set (field of speech acoustics) with LIBSVM and Matlab's Pattern Recignition Tool from the Neural network toolbox to create a simple network with one hidden layer. In the hope of higher classification results i want to try Deep Neural Networks, and i found this code: http://www.mathworks.com/matlabcentral/fileexchange/42853-deep-neural-network
I have some difficulty understanding it.
My data is constructed of 127 samples of 19 parameters, so my input number is 19. I want to classify them in two groups: 0 and 1, so my output number is 1. The values in my data set are normalized between 0 and 1.
My code is the following:
clear all
clc
addpath('..');
load('data.mat')
inputdata = inputs;
outputdata = outputs;
datanum = 127;
outputnum = 1;
hiddennum = 3;
inputnum = 19;
% rbm = randRBM(inputnum, outputnum);
% rbm = pretrainRBM( rbm, inputdata );
dbn = randDBN([inputnum, hiddennum, outputnum]);
dbn = pretrainDBN( dbn, inputdata );
dbn = SetLinearMapping( dbn, inputdata, outputdata );
dbn = trainDBN( dbn, inputdata, outputdata );
estimate = v2h( dbn, inputdata )
[rmse AveErrNum] = CalcRmse(dbn, inputdata, outputdata)
The code runs. The rmse is 0.4183, the AveErrNum is 0.1969. What i need is the classification accuracy between my targets (stored in outputdata) and the networks predictions (Accuracy = data classified correctly / all data).
Where do i find the networks predictions after binarization?
Do I use the right type of network for my classification?
Don't I need to divide my data into Training, Validation and Testing samples (like in the case of a simple neural network with one hidden layer)?
Thanks in advance for any help!

Related

Correct practice and approach for reporting the training and generalization performance

I am trying to learn the correct procedure for training a neural network for classification. Many tutorials are there but they never explain how to report for the generalization performance. Can somebody please tell me if the following is the correct method or not. I am using first 100 examples from the fisheriris data set that has labels 1,2 and call them as X and Y respectively. Then I split X into trainData and Xtest with a 90/10 split ratio. Using trainData I trained the NN model. Now the NN internally further splits trainData into tr,val,test subsets. My confusion is which one is usually used for generalization purpose when reporting the performance of the model to unseen data in conferences/Journals?
The dataset can be found in the link: https://www.mathworks.com/matlabcentral/fileexchange/71468-simple-neural-networks-with-k-fold-cross-validation-manner
rng('default')
load iris.mat;
X = [f(1:100,:) l(1:100)];
numExamples = size(X,1);
indx = randperm(numExamples);
X = X(indx,:);
Y = X(:,end);
split1 = cvpartition(Y,'Holdout',0.1,'Stratify',true); %90% trainval 10% test
istrainval = training(split1); % index for fitting
istest = test(split1); % indices for quality assessment
trainData = X(istrainval,:);
Xtest = X(istest,:);
Ytest = Y(istest);
numExamplesXtrainval = size(trainData,1);
indxXtrainval = randperm(numExamplesXtrainval);
trainData = trainData(indxXtrainval,:);
Ytrain = trainData(:,end);
hiddenLayerSize = 10;
% data format = rows = number of dim, column = number of examples
net = patternnet(hiddenLayerSize);
net = init(net);
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs=50;
[net tr]= train(net,trainData', Ytrain');
Trained = sim(net, trainData'); %outputs predicted labels
train_predict = net(trainData');
performanceTrain = perform(net,Ytrain',train_predict)
lbl_train=grp2idx(Ytrain);
Yhat_train = (train_predict >= 0.5);
Lbl_Yhat_Train = grp2idx(Yhat_train);
[cmMatrixTrain]= confusionmat(lbl_train,Lbl_Yhat_Train )
accTrain=sum(lbl_train ==Lbl_Yhat_Train)/size(lbl_train,1);
disp(['Training Set: Total Train Acccuracy by MLP = ',num2str(100*accTrain ), '%'])
[confTest] = confusionmat(lbl_train(tr.testInd),Lbl_Yhat_Train(tr.testInd) )
%unknown test
test_predict = net(Xtest');
performanceTest = perform(net,Ytest',test_predict);
Yhat_test = (test_predict >= 0.5);
test_lbl=grp2idx(Ytest);
Lbl_Yhat_Test = grp2idx(Yhat_test);
[cmMatrix_Test]= confusionmat(test_lbl,Lbl_Yhat_Test )
This is the output.
Problem1: There seems to be no prediction for the other class. Why?
Problem2: Do I need a separate dataset like the one I created as Xtest for reporting generalization error or is it the practice to use the data trainData(tr.testInd,:) as the generalization test set? Did I create an unnecessary subset?
performanceTrain =
2.2204e-16
cmMatrixTrain =
45 0
45 0
Training Set: Total Train Acccuracy by MLP = 50%
confTest =
9 0
5 0
cmMatrix_Test =
5 0
5 0
There are a few issues with the code. Let's deal with them before answering your question. First, you set a threshold of 0.5 for making decisions (Yhat_train = (train_predict >= 0.5);) while all points of your net prediction are above 0.5. This means you only get zeros in your confusion matrices. You can plot the scores to choose a better threshold:
figure;
plot(train_predict(Ytrain == 1),'.b')
hold on
plot(train_predict(Ytrain == 2),'.r')
legend('label 1','label 2')
cvpartition gave me an error. It ran successfully as split1 = cvpartition(Y,'Holdout',0.1); In any case, artificial neural networks usuallly manage partitioning within the training process, so you feed in X and Y and some parameters for how to do it. See here for example: link where you set
net.divideParam.trainRatio = .4;
net.divideParam.valRatio = .3;
net.divideParam.testRatio = .3;
So how to report the results? Only for the test data. The train data will suffer from overfit, and will show false, too good results. If you use validation data (you havn't), then you cannot show results for it because it will also suffer from overfit. If you let the training do validation for you your test results will be safe from overfit.

Neural Nets with Pymc3

I am trying to use pymc3 to sample from the posterior, a set of single-hidden layer neural nets so that I could then convert the model to a hierarchical one, same as in Radford M.Neal's thesis. my first model looks like this:
def sample(nHiddenUnts,X,Y):
nFeatures = X.shape[1]
with pm.Model() as model:
#priors
bho = pm.Normal('hiddenOutBias',mu=0,sd=100)
who = pm.Normal('hiddenOutWeights',mu=0,sd=100,shape= (nHiddenUnts,1) )
bih = pm.Normal('inputBias',mu=0,sd=100 ,shape=nHiddenUnts)
wih= pm.Normal('inputWeights',mu=0,sd=100,shape=(nFeatures,nHiddenUnts))
netOut=T.dot( T.nnet.sigmoid( T.dot( X , wih ) + bih ) , who )+bho
#likelihood
likelihood = pm.Normal('likelihood',mu=netOut,sd=0.001,observed=Y)
start = pm.find_MAP()
step = pm.Metropolis()
trace = pm.sample(100000, step, start, progressbar=True)
return trace
and in the second model hyperpriors are added which are precisions of noise, input-to-hidden and hidden-to-output weights and biases( e.g. bihTau = precision of input->hidden bias ). The parameters for the hyperpriors are chosen so that they could be broad and also log-transformed.
#Gamma Hyperpriors
bhoTau, log_bhoTau = model.TransformedVar('bhoTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
WhoTau, log_WhoTau = model.TransformedVar('WhoTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
bihTau, log_bihTau = model.TransformedVar('bihTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
wihTau, log_wihTau = model.TransformedVar('wihTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e-4),
pm.logtransform)
noiseTau, log_noiseTau = model.TransformedVar('noiseTau',
pm.Gamma.dist(alpha=1,beta=1e-2,testval=1e+4),
pm.logtransform)
#priors
bho = pm.Normal('hiddenOutBias',mu=0,tau=bhoTau)
who = pm.Normal('hiddenOutWeights',mu=0,tau=WhoTau,shape=(nHiddenUnts,1) )
bih = pm.Normal('inputBias',mu=0,tau=bihTau ,shape=nHiddenUnts)
wih= pm.Normal('inputWeights',mu=0,tau=wihTau ,shape= (nFeatures,nHiddenUnts))
.
.
.
start = pm.find_MAP()
step = pm.NUTS(scaling=start)
where bho,who,bin and win are biases and weights of hidden-to-output and input-to-hidden layers.
To check my models 3 to 5 sample points [0,1) was drawn from a one dimensional toy function of the following form
def g(x):
return np.prod( x+np.sin(2*np.pi*x),axis=1)
The first model (constant hyper-parameters) works fine! but when I sample from the posterior of the hyperparameters+parameters e.g. replace the priors in the first (above)listing with the ones in the second, neither the find_MAP() nor sampling method converges regardless of the number of samples and the resulted ANNs won't interpolate the sample points. Then I tried to integrate hyperpriors into my model one by one. The only one which could be integrated without problem was the noise precision. if I include any of the others then the sampler won't converge to the posterior. I tried this using one 'step function' for all the model variables and also combination of two separate step methods over parameters and hyperparams. In all the cases and with different number of samples the problem was still there.

Inconsistent/Different Test Performance/Error After Training Neural Network in Matlab

Keeping all parameters constant, I get different Mean Average Percentage Errors on my test data on retraining the neural network. Why is this so? Aren't all components of the neural network training process deterministic? Sometimes, I see a difference of up to 1% on successive trainings.
The training code is below
netFeb = newfit(trainX', trainY', networkConfigFeb);
netFeb.performFcn = 'mae';
netFeb = trainlm(netFeb, trainX', trainY');
% Index for the testing Data
startingInd = find(trainInd == 0, 1, 'first');
endingInd = startingInd + daysInMonth('Feb') - 1 ;
% Testing Data
testX = X(startingInd:endingInd,:);
testY = dailyPeakLoad(startingInd:endingInd,:);
actualLoadFeb = testY;
% Calculate the Forcast Load and the Mean Absolute Percentage Error
forecastLoadFeb = sim(netFeb, testX'';
errFeb = testY - forecastLoadFeb;
errpct = abs(errFeb)./testY*100;
MAPEFeb = mean(errpct(~isinf(errpct)));
As A. Donda hinted, since neural networks initialize their weights randomly, they will generate different networks after training. Thus it will give you different performance. While the training process is deterministic, the initial values are not! You may end up in different local minimums as a result or stop in different places.
If you wish to see why, take a look at Why should weights of Neural Networks be initialized to random numbers?
Edit 1:
Notes
Since the user is defining the testing/training data manually, there is no randomization of the training data sets selected

Matlab Neural Network for Classes - Unseen Data

Say I create a neural network to separate classes:
X1; %Some data in Class 1 100x2
X2; %Some data in Class 2 100x2
classInput = [X1;X2];
negative = zeros(N, 1);
positive = ones(N,1);
classTarget = [positive negative; negative positive];
net = feedforwardnet(20);
net = configure(net, classInput, classTarget);
net = train(net, classInput, classTarget);
%output of training data
output = net(classInput);
I can plot the classes and they are correctly separated:
figure();
hold on
style = {'ro' 'bx'};
for i=1:(2*N)
plot(classInput(i,1),classInput(i,2), style{round(output(i,1))+1});
end
However, how can I apply the network that's just been trained to unseen data? There must be a model which is generated by the network that can be applied to new data?
EDIT: Using sim:
Once the network is trained, if I use sim on the training data:
[Z,Xf,Af] = sim(net,classInput);
The result is as expected. But this only works if the input is of the same size. If for example I want to evalute an individual data point:
[Z1,Xf,Af] = sim(net,[1,2]);
size(Z) == size(Z1), but this clearly doesn't make sense? Surely I can evaluate a single data point?
I'm the OP,
I had assumed that the rows of the input matrices were the data samples and the columns were the "categories", this is the other way around. Transposing the matrices before inputting them to the train() function fixes this.

using precomputed kernels with libsvm

I'm currently working on classifying images with different image-descriptors. Since they have their own metrics, I am using precomputed kernels. So given these NxN kernel-matrices (for a total of N images) i want to train and test a SVM. I'm not very experienced using SVMs though.
What confuses me though is how to enter the input for training. Using a subset of the kernel MxM (M being the number of training images), trains the SVM with M features. However, if I understood it correctly this limits me to use test-data with similar amounts of features. Trying to use sub-kernel of size MxN, causes infinite loops during training, consequently, using more features when testing gives poor results.
This results in using equal sized training and test-sets giving reasonable results. But if i only would want to classify, say one image, or train with a given amount of images for each class and test with the rest, this doesn't work at all.
How can i remove the dependency between number of training images and features, so i can test with any number of images?
I'm using libsvm for MATLAB, the kernels are distance-matrices ranging between [0,1].
You seem to already have figured out the problem... According to the README file included in the MATLAB package:
To use precomputed kernel, you must include sample serial number as
the first column of the training and testing data.
Let me illustrate with an example:
%# read dataset
[dataClass, data] = libsvmread('./heart_scale');
%# split into train/test datasets
trainData = data(1:150,:);
testData = data(151:270,:);
trainClass = dataClass(1:150,:);
testClass = dataClass(151:270,:);
numTrain = size(trainData,1);
numTest = size(testData,1);
%# radial basis function: exp(-gamma*|u-v|^2)
sigma = 2e-3;
rbfKernel = #(X,Y) exp(-sigma .* pdist2(X,Y,'euclidean').^2);
%# compute kernel matrices between every pairs of (train,train) and
%# (test,train) instances and include sample serial number as first column
K = [ (1:numTrain)' , rbfKernel(trainData,trainData) ];
KK = [ (1:numTest)' , rbfKernel(testData,trainData) ];
%# train and test
model = svmtrain(trainClass, K, '-t 4');
[predClass, acc, decVals] = svmpredict(testClass, KK, model);
%# confusion matrix
C = confusionmat(testClass,predClass)
The output:
*
optimization finished, #iter = 70
nu = 0.933333
obj = -117.027620, rho = 0.183062
nSV = 140, nBSV = 140
Total nSV = 140
Accuracy = 85.8333% (103/120) (classification)
C =
65 5
12 38