While training a neural network in MATLAB I am using "train" command. Is this command auto divide the data into training, testing, and validation sets or we have to divide the data manually.
Yes, it does. But we can divide the data manually, if we want to. net.divideFcn and net.divideParam fields of the net object should be used:
t=0:0.05:8; x= sin(t);
net = feedforwardnet(3);
net.divideFcn= 'dividerand'; % divide the data randomly
net.divideParam.trainRatio= 0.7; % we use 70% of the data for training
net.divideParam.valRatio= 0.3; % 30% is for validation
net.divideParam.testRatio= 0; % 0% for testing
net = train(net,t,x);
plot(t,x,t,net(t));
Here is an example of a manual data division:
net.divideFcn= 'divideind'; % divide the data manually
net.divideParam.trainInd= 1:100; % training data indices
net.divideParam.valInd= 101:140; % validation data indices
net.divideParam.testInd= 141:161; % testing data indices
Related
In case, there are 2 inputs (X1 and X2) and 1 target output (t) to be estimated by neural network (each nodes has 6 samples):
X1 = [2.765405915 2.403146899 1.843932529 1.321474515 0.916837222 1.251301467];
X2 = [84870 363024 983062 1352580 804723 845200];
t = [-0.12685144347197 -0.19172223428950 -0.29330584684934 -0.35078062276141 0.03826908777226 0.06633047875487];
I was trying to find the best fit of t prediction by using multiple linear regression (Ordinary Least Squares or OLS) manually and the outcomes were pretty good.
I intend to find the a, b, c (regression coefficients) from this equation:
t = a + b*X1 + c*X2
Since the equation is the basic form of multiple linear regression equation with two regressors, of course I could find the value of a, b, c by doing the OLS.
The problem is: I've tried to find the regression coefficients by using neural network (with MATLAB nftool and train it by Levenberg-Marquardt Backpropagation or lmtrain) but have no idea how to find them, though the outcomes were showing less error than the OLS.
Then, several questions that comes:
Is it possible to find the regression coefficient by using neural network?
If it's possible, What kind of ANN algorithm that could solve this kind of problem and how to build it manually?
If you have any ideas how to solve it, please help. I really need your help!
This is the script that generated by MATLAB nftool that I used to fit the output estimation:
% Solve an Input-Output Fitting problem with a Neural Network
% Script generated by NFTOOL
% Created Fri Jun 05 06:26:36 ICT 2015
%
% This script assumes these variables are defined:
%
% x - input data.
% t - target data.
x = [2.765405915 2.403146899 1.843932529 1.321474515 0.916837222 1.251301467; 84870 363024 983062 1352580 804723 845200];
t = [-0.12685144347197 -0.19172223428950 -0.29330584684934 -0.35078062276141 0.03826908777226 0.06633047875487];
inputs = x;
targets = t;
% Create a Fitting Network
hiddenLayerSize = 10;
net = fitnet(hiddenLayerSize);
% Setup Division of Data for Training, Validation, Testing
net.divideParam.trainRatio = 90/100;
net.divideParam.valRatio = 5/100;
net.divideParam.testRatio = 5/100;
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(net,inputs,targets)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)
A neural network will generally not find or encode a formula like t = a + b*X1 + c*X2, unless you built a really simple one with no hidden layers and linear output. If you did then you could read the values [a,b,c] from the weights attached to bias, input 1 and input 2. However, such a network offers no advantage over linear regression (essentially it is linear regression using NN tools to build it and comparatively slow gradient descent to find the lowest least-square error when it can be done in a single pass in OLS).
What you have built instead is a more complex non-linear function. Most likely the error is low because you have over-fit your data, which is very easy to do with a neural net. With your input data as shown, it should be possible to get an training error of 0, but that is not as good as it seems - it just means the neural network has found a complex surface that connects all your examples, which is probably of limited use as a predictive model.
In the MATLAB classic crab classification problem the neural network chooses the test samples from the provided samples only. Suppose, if I provide 30 samples to neural network it randomly takes 5 samples among them as test samples.
Is it possible to specify the test samples from the user side?
fid = fopen('/featureValues.csv');
C = textscan(fid,'%f%f%f%f%s','delimiter',','); % Import data
fclose(fid);
%%
% The first 4 columns of data represent the image's features
% The 5th column represents the category of the image.
features = [C{1} C{2} C{3} C{4}]; % inputs to neural network
tiger = strncmpi(C{5}, 'tiger', 1);
lion = strncmpi(C{5}, 'lion', 1);
dino = strncmpi(C{5}, 'Dino', 1);
%Encoding the image categories
imCat = double([tiger lion dino]); % targets for neural network
% The next step is to preprocess the data into a form that can be used with
% a neural network.
%
% The neural network object in the toolbox expects the samples along
% columns and its features along rows. Our dataset has its samples along
% rows and its features along columns. Hence the matrices have to be
% transposed.
features = features';
imCat = imCat';
%% Building the Neural Network Classifier
% The next step is to create a neural network that will learn to identify
% the class of the images.
%
% Since the neural network starts with random initial weights, the results
% of this demo will differ slightly every time it is run. The random seed
% is set to avoid this randomness. However this is not necessary for your
% own applications.
rand('seed', 491218382)
%%
% A 1-hidden layer feed forward network is created with 20 neurons in the
% hidden layer.
%
net = newff(features,imCat,20); % Create a new feed forward network
% Now the network is ready to be trained.
[net,tr] = train(net,features,imCat);
%% Testing the Classifier
testInputs = features(:,tr.testInd);
testTargets = imCat(:,tr.testInd);
out = net(testInputs); % Get response from trained network
[y_out,I_out] = max(out);
[y_t,I_t] = max(testTargets);
N = size(testInputs,2); % Number of testing samples
fprintf('Total testing samples: %d\n', N);
Read here first where they explain how to use different data division patterns. From the question it seems that you are looking specifically for this approach so that you can specify training,validating and testing data by indices.
I have been following the course of Andrew Ng about Machine Learning, and I currently have some doubts about the implementation of a handwritten recognition tool.
-First he says that he uses a subset of the MNIST dataset, which contaings 5000 training examples and each training example is an image in a 20x20 gray scale format. With that he says that we have a vector of 400 elements of length that is the "unrolled" of the data previously described. Does it mean that the train set has something like the following format?
Training example 1 v[1,2,...,400]
Training example 2 v[1,2,...,400]
...
Training example 5000 v[1,2,...,400]
For the coding part the author gives the following complete code in Matlab:
%% Machine Learning Online Class - Exercise 3 | Part 2: Neural Networks
% Instructions
% ------------
%
% This file contains code that helps you get started on the
% linear exercise. You will need to complete the following functions
% in this exericse:
%
% lrCostFunction.m (logistic regression cost function)
% oneVsAll.m
% predictOneVsAll.m
% predict.m
%
% For this exercise, you will not need to change any code in this file,
% or any other files other than those mentioned above.
%
%% Initialization
clear ; close all; clc
%% Setup the parameters you will use for this exercise
input_layer_size = 400; % 20x20 Input Images of Digits
hidden_layer_size = 25; % 25 hidden units
num_labels = 10; % 10 labels, from 1 to 10
% (note that we have mapped "0" to label 10)
%% =========== Part 1: Loading and Visualizing Data =============
% We start the exercise by first loading and visualizing the dataset.
% You will be working with a dataset that contains handwritten digits.
%
% Load Training Data
fprintf('Loading and Visualizing Data ...\n')
load('ex3data1.mat');
m = size(X, 1);
% Randomly select 100 data points to display
sel = randperm(size(X, 1));
sel = sel(1:100);
displayData(X(sel, :));
fprintf('Program paused. Press enter to continue.\n');
pause;
%% ================ Part 2: Loading Pameters ================
% In this part of the exercise, we load some pre-initialized
% neural network parameters.
fprintf('\nLoading Saved Neural Network Parameters ...\n')
% Load the weights into variables Theta1 and Theta2
load('ex3weights.mat');
%% ================= Part 3: Implement Predict =================
% After training the neural network, we would like to use it to predict
% the labels. You will now implement the "predict" function to use the
% neural network to predict the labels of the training set. This lets
% you compute the training set accuracy.
pred = predict(Theta1, Theta2, X);
fprintf('\nTraining Set Accuracy: %f\n', mean(double(pred == y)) * 100);
fprintf('Program paused. Press enter to continue.\n');
pause;
% To give you an idea of the network's output, you can also run
% through the examples one at the a time to see what it is predicting.
% Randomly permute examples
rp = randperm(m);
for i = 1:m
% Display
fprintf('\nDisplaying Example Image\n');
displayData(X(rp(i), :));
pred = predict(Theta1, Theta2, X(rp(i),:));
fprintf('\nNeural Network Prediction: %d (digit %d)\n', pred, mod(pred, 10));
% Pause
fprintf('Program paused. Press enter to continue.\n');
pause;
end
and the predict function should be complete by the students, I have done the following:
function p = predict(Theta1, Theta2, X)
%PREDICT Predict the label of an input given a trained neural network
% p = PREDICT(Theta1, Theta2, X) outputs the predicted label of X given the
% trained weights of a neural network (Theta1, Theta2)
% Useful values
m = size(X, 1);
num_labels = size(Theta2, 1);
% You need to return the following variables correctly
p = zeros(size(X, 1), 1);
X = [ones(m , 1) X];
% ====================== YOUR CODE HERE ======================
% Instructions: Complete the following code to make predictions using
% your learned neural network. You should set p to a
% vector containing labels between 1 to num_labels.
%
% Hint: The max function might come in useful. In particular, the max
% function can also return the index of the max element, for more
% information see 'help max'. If your examples are in rows, then, you
% can use max(A, [], 2) to obtain the max for each row.
%
a1 = X;
a2 = sigmoid(a1*Theta1');
a2 = [ones(m , 1) a2];
a3 = sigmoid(a2*Theta2');
[M , p] = max(a3 , [] , 2);
Even thought it runs I am not completely aware of how it really works (I have just followed the step by step instructions that is on the author's website). I have doubts in the following:
The author considers that X(input) is an array of 5000 x 400 elements, or it has 400 neurons as input, with 10 neurons as output and a hidden layer. Does it mean this 5000 x 400 values are the training set?
The author gives us the values of theta 1 and theta 2, which I believe serve as weights for the calculations on the inner layer, but how does values are obtained? Why does he uses 25 neurons of hidden layer and not 24 or 30?
Any help will be apreciated.
Thanks
Let's break your question in parts:
First he says that he uses a subset of the MNIST dataset, which
contaings 5000 training examples and each training example is an image
in a 20x20 gray scale format. With that he says that we have a vector
of 400 elements of length that is the "unrolled" of the data
previously described. Does it mean that the train set has something
like the following format? (...)
You're on the right track. Each training example is a 20x20 image. The simplest neural network model, introduced in the course, treats each image just as a simple 1x400 vector (the "unrolled" means exactly this transformation). The dataset is stored in a matrix because this way you can perform computations faster exploiting the efficient linear algebra libraries which are used by Octave/Matlab. You don't need necessarily to store all training examples as a 5000x400 matrix, but this way your code will run faster.
The author considers that X(input) is an array of 5000 x 400 elements,
or it has 400 neurons as input, with 10 neurons as output and a hidden
layer. Does it mean this 5000 x 400 values are the training set?
The "input layer" is nothing but the very input image. You can think of it as neurons whose output values were already calculated or as the values were coming from outside the network (think about your retina. It is like the input layer of you visual system). Thus this network has 400 input units (the "unrolled" 20x20 image). But of course, your training set doesn't consist of a single image, thus you put all your 5000 images together in a single 5000x400 matrix to form your training set.
The author gives us the values of theta 1 and theta 2, which I believe
serve as weights for the calculations on the inner layer, but how does
values are obtained?
These theta values were found using a algorithm called backpropagation. If you didn't have to implement it in the course yet, just be patient. It might be in the exercises soon! Btw, yes they are the weights.
Why does he uses 25 neurons of hidden layer and not 24 or 30?
He probably chose an arbitrarily value that doesn't run too slow, neither has too poor performance. You probably can find much better values for this hyper-parameters. But if you increase it too much, the training procedure will take probably much longer. Also since you are just using a small portion of the hole training set (the original MNIST has 60000 training examples and 28x28 images), you need to use a "small" number of hidden units to prevent over fitting. If you use too many units your neurons will "learn by heart" the training examples and will not be able to generalize to new unseen data. Finding the hyper parameters, such as the number of the hidden units, is a kind of art that you will master with experience (and maybe with Bayesian optimization and more advanced method, but that's another story xD).
I did the same course some time ago.
X is the input data. Therefore X is the matrix consisting of the 5 000 vectors of 400 elements each. There is no training set, because the network is pre trained.
Normally the values for theta 1 and 2 are trained. How this is done is a subject for the next few lectures. (Backpropagation algorithm)
I'm not entirely sure, why he used 25 neurons as hidden layer. However my guess is, that this number of neurons simply works, without making the training step take forever.
I have a 10 by 57300 matrix as an input, and a 1 by 57300 matrix as an output that only includes 0 and 1.I tried to train neural network with feed-forward back propagation and layer recurrent back propagation structures. I tried those structures with one hidden layer and 40 neurons in hidden layer.In the best case the performance stopped at point 0.133. I simulated the network with new inputs but it did not give me the result that I wanted. And the results were not even close to what I trained the network with. Do you have any suggestion to improve the performance of the network?
inputs = input;
targets = output;
% Create a Fitting Network
hiddenLayerSize = 50;
net = fitnet(hiddenLayerSize);
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.inputs{1}.processFcns = {'removeconstantrows','mapminmax'};
net.outputs{2}.processFcns = {'removeconstantrows','mapminmax'};
% Setup Division of Data for Training, Validation, Testing
% For a list of all data division functions type: help nndivide
net.divideFcn = 'divideind';
net.divideParam.trainInd=1:28650; % The first 94 inputs are for training.
net.divideParam.valInd=28651:42977; % The first 94 inputs are for validation.
net.divideParam.testInd=42978:57300; % The last 5 inputs are for testing the network.
% For help on training function 'trainlm' type: help trainlm
% For a list of all training functions type: help nntrain
net.trainFcn = 'trainlm'; % Levenberg-Marquardt
% Choose a Performance Function
% For a list of all performance functions type: help nnperformance
net.performFcn = 'mse'; % Mean squared error
% Choose Plot Functions
% For a list of all plot functions type: help nnplot
net.plotFcns = {'plotperform','plottrainstate','ploterrhist', ...
'plotregression', 'plotfit'};
% Train the Network
[net,tr] = train(net,inputs,targets);
% Test the Network
outputs = net(inputs);
errors = gsubtract(targets,outputs);
performance = perform(net,targets,outputs)
% Recalculate Training, Validation and Test Performance
trainTargets = targets .* tr.trainMask{1};
valTargets = targets .* tr.valMask{1};
testTargets = targets .* tr.testMask{1};
trainPerformance = perform(net,trainTargets,outputs)
valPerformance = perform(net,valTargets,outputs)
testPerformance = perform(net,testTargets,outputs)
% View the Network
view(net)
% Plots
% Uncomment these lines to enable various plots.
%figure, plotperform(tr)
%figure, plottrainstate(tr)
%figure, plotfit(net,inputs,targets)
%figure, plotregression(targets,outputs)
%figure, ploterrhist(errors)
This is the code that I used for training neural network. My 57300 input divides to 300X191 groups of data. I mean each set of input is a 10 by 191 group. So that's why I've used "divideind". I have normalized the input and output matrix in [-1 1] range because I use tansig transfer function. But I still do not get the result that I want form the network.
it is usually complicate to tell how to improve a network with such few information. In some cases a performance of 0.133 could be a good result, in others not so much. Nonetheless, general ideas to improve networks could be normalizing the inputs to [0,1] range, performing feature selection, maybe adding a rbm layer and performing non-supervised training before the supervised backpropagation learning scheme (see Deep belief networks), increasing the data for learning, or using cross-validation for choosing the free parameters and to early stop.
Edit v1:
I see that you have 94 inputs for testing, 94 for validation and 5 for testing. This ratio seems a bit error prone.
First of all, 94 input sets for a feature vector of 10 is very few. For a feature vector of 100 is futile :). So basically the problem is that you don't have enough data to train 40 neurons. If you can't generate more data, i'd recommend a new splitting:
150 Training
10 Validation
30 Testing
This information is rather more general approach to improve NN performance:
The training methodology
How many epochs? It is possible that the network is over-trained to your input set.
Parameters of the BP algorithm (momentum, adaptation factors, etc.)
The feature extraction algorithm
Often, this is the main problem. The algorithm is not able to really extract the specific features of an input.
I suggest plotting all the inputs visually and see if you are seeing any pattern and are able to determine the separation visually. After all, the neural network is a bit more complex statistic system.
Good luck!
I am not sure if you are training your network for classification purpose or regression. But if you directly want to improve the performance I will suggest you to do following things :
1- Increase the number of nodes in the hidden layer further. Try out something bigger to see if the neural network is doing better(it may be because of overtraining, but ignore it for now).
2- If their is very high non-linearity in the data space then add one more hidden layer. But adding many of them will not work in the case of simple neural network.
I am trying to fit data that is described by a Gaussian distribution of convoluted exponential. An example of some perfect simulated data to illustrate my point:
clc; clear all
modeValue = 40; % mode value of our simulated gaussian
sig = 4; % variance
values = [1:0.5:100]; % vector for gaussian to be evaluated at
gauss=(1/(sqrt(2*pi)*sig)).*exp(-(values-modeValue).^2/(2*sig^2))'; % create our gaussian
gauss=gauss./max(gauss); % normalize
tau = logspace(-6,2,256); % generate our simulated x-data
for ii = 1:1:length(values)
data(:,ii) = exp(-tau.*values(ii)); % generated a data set
end
dataConv = data*gauss; % Multiply each data set by the corresponding gaussian weight
dataConv = dataConv./max(dataConv); % This is our final simulated data
semilogx(tau,dataConv,'ro');
If this code is run in MATLAB, a perfect generated data set dataConv is created.
My goal now is to recover our gauss plot by doing a lsqcurvefit on our generated data. The idea is to give a starting value for modeValue, sig and values, and have MATLAB automatically select them, generated all of the decay curves, convolute all the decays into 1 data set, check the fit, then go back and tweak the input values.
I can fit a single exponential using lsqcurvefit, but don't see how to do what I am after. Any help is greatly appreciated. Thanks