My dataset is huge. Let X be input training data which is 6X140000 and T be targets, which are 3X140000.
net = patternnet(10);
% Set divide parameters
net.divideFcn = 'divideind';
net.divideParam.trainInd = loc_Train;
net.divideParam.testInd = loc_Test;
net.divideParam.valInd = loc_Valid;
net.trainFcn = 'trainscg';
% Set training parameters
net.trainParam.epochs = 1000;
net.trainParam.max_fail = 20;
net.trainParam.min_grad = 1e-20;
net.trainParam.goal = 1e-10; % Set a very small value
% Set network performance functions
net.performFcn = 'crossentropy';
net.performParam.regularization = 0.02;
net.performParam.normalization = 'none';
net.trainParam.showWindow = 0;
net.trainParam.showCommandLine = 1;
After I have setup my network, I run the following code to train my network.
[net, tr] = train(net, X, T);
The command line shows:
Calculation mode: MEX Training Pattern Recognition Neural Network
with TRAINSCG.
Epoch 0/1000, Time 0.001, Performance 0.0061672/1e-10,
Gradient 0.00065207/1e-20, Validation Checks 0/20
Epoch 20/1000, Time 2.214, Performance 0.0060292/1e-10, Gradient 6.3997e-05/1e-20, Validation Checks 20/20
Training with TRAINSCG completed: Validation
stop.
The tr object, which is the training record, holds information such as testing indices. however, tr.testInd returns empty.
Related
I'm trying to use k-fold cross-validation with the patternnet neural network.
inputs1 is a feature vector and targets1 is label vector from 'iris_dataset'. And xtrain, xtest, ytrain, and ytest are training & testing features and labels respectively after splitting using the cvpartition function.
The steps are as follows:
1.First of all, a Pattern Recognition Network (patternnet) is Created.
In the first and second scripts: net = patternnet;
2.After dividing data into k-folds using cvpartition, two training & testing features and labels are created (k=10).
In the first and second scripts: fold = cvpartition(targets_Vec, 'kfold', kfold);
3.Then, the configure command is used to configure the network object and also initializes the weights and biases of the network;
In the first script: net = configure(net, xtrain', dummyvar(ytrain)'); % xtrain and ytrain are features and labels from step (2).
or
In the second script: net = configure(net, inputs1, targets1); % inputs1 and targets1 are features and labels before splitting up.
4.After initializing the parameters and hyper-parameters, the network is trained using the training data (by the train() function).
In the first script: [net, tr] = train(net, xtrain', dummyvar(ytrain)'); % xtrain and ytrain are features and labels from step (2).
or
In the second script: [net, tr] = train(net, inputs1, targets1); % inputs1 and targets1 are features and labels before splitting up.
5.And finally, the targets are estimated using the trained network (by the net() function).
In the first script: pred = net(xtest'); % testing features from step (2).
or
In the second script: pred = net(inputs1);
Since the training & testing features are separated using cvpartition, so the network should be trained using training features and its labels and then it should be tested by testing features (new data).
Although, the train() function is used for training the network, but it splits its own input (training data and labels from step (2)) into training, validation and testing data while the original testing data from step (2) remains unused.
Therefore, I need a function that is used training features and labels from step 2 (train and validation) for learning and also another function to classifying the new data (testing features from step 2).
After searching, I wrote two scripts, I think the first one isn't correct but I don't sure the second one is also incorrect or not? How to solve it?
The first script:
clc; close all; clearvars;
load iris_dataset
max_iter = 10;
kfold = 10;
[inputs, targets] = iris_dataset;
inputs = inputs';
targets = targets';
targets_Vec= [];
for j = 1 : size(targets, 1)
if max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==1
targets_Vec = [targets_Vec; 1];
elseif max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==2
targets_Vec = [targets_Vec; 2];
elseif max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==3
targets_Vec = [targets_Vec; 3];
end
end
net = patternnet; ... Create a Pattern Recognition Network
rng('default');
... Divide data into k-folds
fold = cvpartition(targets_Vec, 'kfold', kfold);
... Pre
pred2 = []; ytest2 = []; Afold = zeros(kfold,1);
... Neural network start
for k = 1 : kfold
... Call index of training & testing sets
trainIdx = fold.training(k); testIdx = fold.test(k);
... Call training & testing features and labels
xtrain = inputs(trainIdx,:); ytrain = targets_Vec(trainIdx);
xtest = inputs(testIdx,:); ytest = targets_Vec(testIdx);
... configure
net = configure(net, xtrain', dummyvar(ytrain)');
... Initialize neural network
net.layers{1}.name='Hidden Layer 1';
net.layers{2}.name='Output Layer';
net.layers{1}.size = 20;
net.layers{1}.transferFcn = 'tansig';
net.trainFcn = 'trainscg';
net.performFcn = 'crossentropy';
... Choose Input and Output Pre/Post-Processing Functions
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
... Train the Network
[net, tr] = train(net, xtrain', dummyvar(ytrain)');
... Estimate the targets using the trained network.(Test)
pred = net(xtest');
... Confusion matrix
[c, cm] = confusion(dummyvar(ytest)',pred);
... Get accuracy for each fold
Afold(k) = 100*sum(diag(cm))/sum(cm(:));
... Store temporary result for each fold
pred2 = [pred2(1:end,:), pred];
ytest2 = [ytest2(1:end); ytest];
end
... Overall confusion matrix
[~,confmat] = confusion(dummyvar(ytest2)', pred2);
confmat=transpose(confmat);
... Average accuracy over k-folds
acc = mean(Afold);
... Store results
NN.fold = Afold;
NN.acc = acc;
NN.con = confmat;
fprintf('\n Final classification Accuracy (NN): %g %%',acc);
The second script:
clc; close all; clearvars;
load iris_dataset
max_iter = 10;
kfold = 10;
[inputs1, targets1] = iris_dataset;
inputs = inputs1';
targets = targets1';
targets_Vec= [];
for j = 1 : size(targets, 1)
if max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==1
targets_Vec = [targets_Vec; 1];
elseif max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==2
targets_Vec = [targets_Vec; 2];
elseif max(targets(j, 1:3) == 1) && find(targets(j, 1:3))==3
targets_Vec = [targets_Vec; 3];
end
end
net = patternnet; ... Create a Pattern Recognition Network
rng('default');
... Divide data into k-folds
fold = cvpartition(targets_Vec, 'kfold', kfold);
... Pre
pred2 = []; ytest2 = []; Afold = zeros(kfold,1);
... Neural network start
for k = 1 : kfold
... Call index of training & testing sets
trainIdx = fold.training(k); testIdx = fold.test(k);
... Call training & testing features and labels
xtrain = inputs(trainIdx,:); ytrain = targets_Vec(trainIdx);
xtest = inputs(testIdx,:); ytest = targets_Vec(testIdx);
... configure
net = configure(net, inputs1, targets1);
trInd = find(trainIdx); tstInd = find(testIdx);
net.divideFcn = 'divideind';
net.divideParam.trainInd = trInd;
net.divideParam.testInd = tstInd;
... Initialize neural network
net.layers{1}.name='Hidden Layer 1';
net.layers{2}.name='Output Layer';
net.layers{1}.size = 20;
net.layers{1}.transferFcn = 'tansig';
net.trainFcn = 'trainscg';
net.performFcn = 'crossentropy';
... Choose Input and Output Pre/Post-Processing Functions
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
... Train the Network
[net, tr] = train(net, inputs1, targets1);
pred = net(inputs1); ... Estimate the targets using the trained network (Test)
... Confusion matrix
[c, cm] = confusion(targets1, pred);
y = net(inputs1);
e = gsubtract(targets1, y);
performance = perform(net, targets1, y);
tind = vec2ind(targets1);
yind = vec2ind(y);
percentErrors = sum(tind ~= yind)/numel(tind);
... Recalculate Training, Validation and Test Performance
trainTargets = targets1 .* tr.trainMask{1};
% valTargets = targets1 .* tr.valMask{1};
testTargets = targets1 .* tr.testMask{1};
trainPerformance = perform(net, trainTargets, y);
% valPerformance = perform(net, valTargets, y);
testPerformance = perform(net, testTargets, y);
test_Fold(k) = testPerformance;
end
test_Fold_mean = mean(test_Fold);
acc = 100*(1-test_Fold_mean);
fprintf('\n Final classification Accuracy (NN): %g %%',acc);
I have a multiclass classification task, and I have tried to use 'trainSoftmaxLayer' in Matlab, but it's a CPU implementation version, and is slow. So I tried to read the documentation for a GPU option, like 'trainSoftmaxLayer('useGPU', 'yes')' in traditional neural network, but there isn't any related options.
Finally, the problem is sovled by hacking the source code of trainSoftmaxLayer.m, which is provided by MATLAB. We can write our own GPU-enabled softmax layer like this:
function [net] = trainClassifier(x, t, use_gpu, showWindow)
net = network;
% define topology
net.numInputs = 1;
net.numLayers = 1;
net.biasConnect = 1;
net.inputConnect(1, 1) = 1;
net.outputConnect = 1;
% set values for labels
net.name = 'Softmax Classifier with GPU Option';
net.layers{1}.name = 'Softmax Layer';
% define transfer function
net.layers{1}.transferFcn = 'softmax';
% set parameters
net.performFcn = 'crossentropy';
net.trainFcn = 'trainscg';
net.trainParam.epochs = 1000;
net.trainParam.showWindow = showWindow;
net.divideFcn = 'dividetrain';
if use_gpu == 1
net = train(net, x, full(t), 'useGPU', 'yes');
else
net = train(net, x, full(t));
end
end
I want to train my recognition system with the feature dataset that I have created.
There are 9 inputs (features) and 40 targets (classes)
inputs and targets are on the same matrix. The first 9 columns are the features and the remaining 40 columns are the outputs
I have written this code below:
load dataSet.mat;
inputs = featureSet(:,1:9)';
targets = featureSet(:,10:49)';
% Create a Pattern Recognition Network
hiddenLayerSize = ns;
net = patternnet(hiddenLayerSize);
net.divideParam.trainRatio = trRa/100;
net.divideParam.valRatio = vaRa/100;
net.divideParam.testRatio = teRa/100;
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs);
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs);
valPerformance = perform(net,valTargets,outputs);
testPerformance = perform(net,testTargets,outputs);
I have copied this code from my previous working character recognition application. There is only two differences from the former application: input count and target count.
After starting training GUI window of nntraintool opens and no action is observed. Additionally, I get message at the bottom says "Performance goal met."
What could be the reason?
I got very different training efficiency with the following network
net = patternnet(hiddenLayerSize);
and the following one
net = feedforwardnet(hiddenLayerSize, 'trainscg');
net.layers{1}.transferFcn = 'tansig';
net.layers{2}.transferFcn = 'softmax';
net.performFcn = 'crossentropy';
on the same data.
I was thinking networks should be the same.
What thing I forgot?
UPDATE
The code below demonstrates, that network behavior is uniquely depends on network creation function.
Each type of network was ran two times. This excludes random generator issues or something. Data is the same.
hiddenLayerSize = 10;
% pass 1, with patternnet
net = patternnet(hiddenLayerSize);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
[net,tr] = train(net,x,t);
y = net(x);
performance = perform(net,t,y);
fprintf('pass 1, patternnet, performance: %f\n', performance);
fprintf('num_epochs: %d, stop: %s\n', tr.num_epochs, tr.stop);
% pass 2, with feedforwardnet
net = feedforwardnet(hiddenLayerSize, 'trainscg');
net.layers{1}.transferFcn = 'tansig';
net.layers{2}.transferFcn = 'softmax';
net.performFcn = 'crossentropy';
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
[net,tr] = train(net,x,t);
y = net(x);
performance = perform(net,t,y);
fprintf('pass 2, feedforwardnet, performance: %f\n', performance);
fprintf('num_epochs: %d, stop: %s\n', tr.num_epochs, tr.stop);
% pass 1, with patternnet
net = patternnet(hiddenLayerSize);
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
[net,tr] = train(net,x,t);
y = net(x);
performance = perform(net,t,y);
fprintf('pass 3, patternnet, performance: %f\n', performance);
fprintf('num_epochs: %d, stop: %s\n', tr.num_epochs, tr.stop);
% pass 2, with feedforwardnet
net = feedforwardnet(hiddenLayerSize, 'trainscg');
net.layers{1}.transferFcn = 'tansig';
net.layers{2}.transferFcn = 'softmax';
net.performFcn = 'crossentropy';
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
[net,tr] = train(net,x,t);
y = net(x);
performance = perform(net,t,y);
fprintf('pass 4, feedforwardnet, performance: %f\n', performance);
fprintf('num_epochs: %d, stop: %s\n', tr.num_epochs, tr.stop);
Output follows:
pass 1, patternnet, performance: 0.116445
num_epochs: 353, stop: Validation stop.
pass 2, feedforwardnet, performance: 0.693561
num_epochs: 260, stop: Validation stop.
pass 3, patternnet, performance: 0.116445
num_epochs: 353, stop: Validation stop.
pass 4, feedforwardnet, performance: 0.693561
num_epochs: 260, stop: Validation stop.
Looks like those two aren't quite the same:
>> net = patternnet(hiddenLayerSize);
>> net2 = feedforwardnet(hiddenLayerSize,'trainscg');
>> net.outputs{2}.processParams{2}
ans =
ymin: 0
ymax: 1
>> net2.outputs{2}.processParams{2}
ans =
ymin: -1
ymax: 1
The net.outputs{2}.processFcns{2} is mapminmax so I gather that one of these is re-scaling it's output to match the output range of your real data better.
For future reference, you can do nasty dirty things like compare the interior data structures by casting to struct. So I did something like
n = struct(net); n2 = struct(net2);
for fn=fieldnames(n)';
if(~isequaln(n.(fn{1}),n2.(fn{1})))
fprintf('fields %s differ\n', fn{1});
end
end
to help pinpoint the differences.
As usually network not behave each training absolutely the same way. It can depend from three ( I mean I know three ) reasons:
Initial initialization of neural network.
Normalization of data
Scaling of data
If to speak about (1) the network is initially configured with random weights, in some small range with different signs. For example neuron with 6 inputs can get initial weights like this: 0.1, -0.3, 0.16, -0.23, 0.015, -0.0005. And this can lead to little bit another training result. If to speak about (2) if your normalization is performed poorly then learning algorithm converges to local minima, and can't jump out of it. The same applies to case (3) if your data is in need of scaling, and you didn't make it.
I created a neural network matlab. This is the script:
load dati.mat;
inputs=dati(:,1:8)';
targets=dati(:,9)';
hiddenLayerSize = 10;
net = patternnet(hiddenLayerSize);
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax', 'mapstd','processpca'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax', 'mapstd','processpca'};
net = struct(net);
net.inputs{1}.processParams{2}.ymin = 0;
net.inputs{1}.processParams{4}.maxfrac = 0.02;
net.outputs{2}.processParams{4}.maxfrac = 0.02;
net.outputs{2}.processParams{2}.ymin = 0;
net = network(net);
net.divideFcn = 'divideind';
net.divideMode = 'sample'; % Divide up every sample
net.divideParam.trainInd = 1:428;
net.divideParam.valInd = 429:520;
net.divideParam.testInd = 521:612;
net.trainFcn = 'trainscg'; % Scaled conjugate gradient backpropagation
net.performFcn = 'mse'; % Mean squared error
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', 'plotregression', 'plotconfusion', 'plotroc'};
net=init(net);
net.trainParam.max_fail=20;
[net,tr] = train(net,inputs,targets);
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
Now I want to save the weights and biases of the network and write the equation.
I had saved the weights and biases:
W1=net.IW{1,1};
W2=net.LW{2,1};
b1=net.b{1,1};
b2=net.b{2,1};
So, I've done the data preprocessing and I wrote the following equation
max_range=0;
[y,ps]=removeconstantrows(input, max_range);
ymin=0;
ymax=1;
[y,ps2]=mapminmax(y,ymin,ymax);
ymean=0;
ystd=1;
y=mapstd(x,ymean,ystd);
maxfrac=0.02;
y=processpca(y,maxfrac);
in=y';
uscita=tansig(W2*(tansig(W1*in+b1))+b2);
But with the same input input=[1:8] I get different results. why? What's wrong?
Help me please! It's important!
I use Matlab R2010B
It looks like you are pre-processing the inputs but not post-processing the outputs. Post processing uses the "reverse" processing form. (Targets are pre-processed, so outputs are reverse processed).
This equation
uscita=tansig(W2*(tansig(W1*in+b1))+b2);
is wrong. Why do you write two tansig? You have 10 nerouns you should write it 10 times or use for i=1:10;