I am training a neural network and it stopped training due to the gradient stopping condition. From what I can see the gradient 8.14e-0.6 is larger than minimum gradient 1e-0.5, so why did it stop? Is it because the gradient wasn't improving so there was little point continuing?
I am very new to neural networks (and using MATLAB's nntool) so any help/explanation would be much appreciated.
This is not a neural network problem, it is a problem of understanding floating point representations:
8.14e-06 = 8.14×10^−6 = 0.00000814 < 0.00001 = 1.0x10^-5 = 1e-05
Related
I am looking at (two-layer) feed-forward Neural Networks in Matlab. I am investigating parameters that can minimise the classification error.
A google search reveals that these are some of them:
Number of neurons in the hidden layer
Learning Rate
Momentum
Training type
Epoch
Minimum Error
Any other suggestions?
I've varied the number of hidden neurons in Matlab, varying it from 1 to 10. I found that the classification error is close to 0% with 1 hidden neuron and then grows very slightly as the number of neurons increases. My question is: shouldn't a larger number of hidden neurons guarantee an equal or better answer, i.e. why might the classification error go up with more hidden neurons?
Also, how might I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
Many thanks
Since you are considering a simple two layer feed forward network and have already pointed out 6 different things you need to consider to reduce classification errors, I just want to add one thing only and that is amount of training data. If you train a neural network with more data, it will work better. Note that, training with large amount of data is a key to get good outcome from neural networks, specially from deep neural networks.
Why the classification error goes up with more hidden neurons?
Answer is simple. Your model has over-fitted the training data and thus resulting in poor performance. Note that, if you increase the number of neurons in hidden layers, it would decrease training errors but increase testing errors.
In the following figure, see what happens with increased hidden layer size!
How may I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
I am expecting you have already seen feed forward neural net in Matlab. You just need to manipulate the second parameter of the function feedforwardnet(hiddenSizes,trainFcn) which is trainFcn - a training function.
For example, if you want to use gradient descent with momentum and adaptive learning rate backpropagation, then use traingdx as the training function. You can also use traingda if you want to use gradient descent with adaptive learning rate backpropagation.
You can change all the required parameters of the function as you want. For example, if you want to use traingda, then you just need to follow the following two steps.
Set net.trainFcn to traingda. This sets net.trainParam to traingda's default parameters.
Set net.trainParam properties to desired values.
Example
net = feedforwardnet(3,'traingda');
net.trainParam.lr = 0.05; % setting the learning rate to 5%
net.trainParam.epochs = 2000 % setting number of epochs
Please see this - gradient descent with adaptive learning rate backpropagation and gradient descent with momentum and adaptive learning rate backpropagation.
I'm having a problem setting up a proper Neural Network for one class classification. Basically I've only the features that rapresent a background of an image. So the training phase would train the NN on those features. During the execution phase the NN will have features that could be "background" or "foreground" (the upper step is segmentation, I've already done it). NOTE: I can't train NN on "foreground" because I don't know if the segmentation process acquires only foreground objects. How I'm supposed to set up correctly my NN?
Here is some piece of code:
toTrainFeat = computeFeatures(backBboxes,frame);
classes(1:size(backBboxes,1))=1; % one-class
[net Y E] = adapt(net,toTrainFeat,classes); % Incremental learning
if numFrame >=40 || sse(E) <0.01 %classify only after 40 frames OR if NN is smart enough
y = net(toClassifyFeat);
y
end
This code does not work, I think because I'm submitting only ONE-CLASS to the adapt method (in fact it crashes when it call adapt). Any help?
Thanks a lot.
I have an RNN model. After about 10K iterations, the loss stops decreasing, but the loss is not very small yet. Does it always mean the optimization is trapped in a local minimum?
In general, what would be the actions should I take to address this issue? Add more training data? Change a different optimization scheme (SGD now)? Or Other options?
Many thanks!
JC
If you are training you neural network using a gradient vector based algorithm such as Back Propagation or Resilient Propagation it can stop improving when it finds a local minimum and it is normal because of the nature of this type fo algorithm. In this case, the propagation algorithms is used to search what a (gradient) vector is pointing.
As a suggestion you could add a different strategy during the training to explore the space of search instead only searching. For sample, a Genetic Algorithm or the Simulated Annealing algorithm. These approaches will provide a exploration of possibilities and it can find a global minimum. You could implement 10 itegrations for each 200 iterations of the propagation algorithm, creating a hybrid strategy. For sample (it's just a pseudo-code):
int epochs = 0;
do
{
train();
if (epochs % 200 == 0)
traingExplorativeApproach();
epochs++;
} while (epochs < 10000);
I've developed a strategy like this using Multi-Layer Perceptrons and Elman recurrent neural network in classification and regressions problems and both cases a hybrid strategy has provided better results then a single propagation training.
I don't quite understand why a sigmoid function is seen as more useful (for neural networks) than a step function... hoping someone can explain this for me. Thanks in advance.
The (Heaviside) step function is typically only useful within single-layer perceptrons, an early type of neural networks that can be used for classification in cases where the input data is linearly separable.
However, multi-layer neural networks or multi-layer perceptrons are of more interest because they are general function approximators and they are able to distinguish data that is not linearly separable.
Multi-layer perceptrons are trained using backpropapagation. A requirement for backpropagation is a differentiable activation function. That's because backpropagation uses gradient descent on this function to update the network weights.
The Heaviside step function is non-differentiable at x = 0 and its derivative is 0 elsewhere. This means gradient descent won't be able to make progress in updating the weights and backpropagation will fail.
The sigmoid or logistic function does not have this shortcoming and this explains its usefulness as an activation function within the field of neural networks.
It depends on the problem you are dealing with. In case of simple binary classification, a step function is appropriate. Sigmoids can be useful when building more biologically realistic networks by introducing noise or uncertainty. Another but compeletely different use of sigmoids is for numerical continuation, i.e. when doing bifurcation analysis with respect to some parameter in the model. Numerical continuation is easier with smooth systems (and very tricky with non-smooth ones).
I've created an OCR with matlab's Neural Networks.
I've used traingdx
net.trainParam.epochs = 8000;
net.trainParam.min_grad = 0.0000;
net.trainParam.goal = 10e-6;
I've noticed that when I use different goals I get different results (as expected of course).
The weird thing is that I found that I have to "play" with the goal value to get good results.
I expected that the lower you go, the better the results and recognition. But I found that if I lower the goal to like 10e-10 i actually get worse recognition results.
Any idea why lowering the goal would decrease the correctness of the Neural Network ?
I think it might have something to do with it trying too hard to it right, so it doesn't work as well with noise and change.
My NN knowledge is a little rusty, but yes, training the network too much will overtrain it. This will make the network work better on the training vectors you give it, but will make it worse for different inputs.
This is why you generally train it on a set of training vectors and then test the quality with a set of testing vectors. You can do the training iteratively: train on the training set to a certain goal accuracy, then check results for your testing set, increase your goal accuracy and repeat. Stop training when your result on the testing set is worse than what you previously had.