GA trained NN performs worse on Test Set than BP trained NN - neural-network

I trained a Neural Network with a GA and with Backpropagation. The GA finds suitable weights for the training data but performs poorly on the test data. If I train the NN with BackPropagation, it performs much better on the test data even though the training error isn't much smaller than for the GA trained version. Even when I use the weights obtained by the GA as initial weights for Backpropagation, the NN performs worse on the test data than using only Backpropagation for training. Can anyone tell me where I could have made a mistake?

I suggest you read something about overfitting. In short you will be excelent at training set but poor at testing set(because NN follows anomaly and uncertainity and datas). Task of NN is generalize, but GA only perfect minimize error in training set(to be fair, this is GA task).
There are some methods how to deal with overfitting. I suggest you use validation set. First step is division your data into the three sets. Training testing and validation. Method is simple, you will train your NN with GA to minimalize error on training set, but you also run your NN on validation set, only run, not train. Error of network decrease on training set, but error should also decrease at validation set. So if error decrease at training set, but start increase at validation set, you must stop with learning(please don't stop at first iterations).
Hope it will be helpful.

I have encountered a similar problem, and the choice of the initial values of the neural network does not seem to affect the final classification accuracy. I used the feedforwardnet() function in matlab to compare the two cases. One is direct training, and the program gives random initial weights and bias values. One is to find the appropriate initial weights values and bias values through the GA algorithm, and then assign them to the neural network, and then start training. However, the latter approach does not improve the accuracy of neural network classification.

Related

How to deal with the randomness of NN training process?

Consider the training process of deep FF neural network using mini-batch gradient descent. As far as I understand, at each epoch of the training we have different random set of mini-batches. Then iterating over all mini batches and computing the gradients of the NN parameters we will get random gradients at each iteration and, therefore, random directions for the model parameters to minimize the cost function. Let's imagine we fixed the hyperparameters of the training algorithm and started the training process again and again, then we would end up with models, which completely differs from each other, because in those trainings the changes of model parameters were different.
1) Is it always the case when we use such random based training algorithms?
2) If it is so, where is the guaranty that training the NN one more time with the best hyperparameters found during the previous trainings and validations will yield us the best model again?
3) Is it possible to find such hyperparameters, which will always yield the best models?
Neural Network are solving a optimization problem, As long as it is computing a gradient in right direction but can be random, it doesn't hurt its objective to generalize over data. It can stuck in some local optima. But there are many good methods like Adam, RMSProp, momentum based etc, by which it can accomplish its objective.
Another reason, when you say mini-batch, there is at least some sample by which it can generalize over those sample, there can be fluctuation in the error rate, and but at least it can give us a local solution.
Even, at each random sampling, these mini-batch have different-2 sample, which helps in generalize well over the complete distribution.
For hyperparameter selection, you need to do tuning and validate result on unseen data, there is no straight forward method to choose these.

Why disable dropout during validation and testing?

I've seen in multiple places that you should disable dropout during validation and testing stages and only keep it during the training phase. Is there a reason why that should happen? I haven't been able to find a good reason for that and was just wondering.
One reason I'm asking is because I trained a model with dropout, and the results turned out well - about 80% accuracy. Then, I went on to validate the model but forgot to set the prob to 1 and the model's accuracy went down to about 70%. Is it supposed to be that drastic? And is it as simple as setting the prob to 1 in each dropout layer?
Thanks in advance!
Dropout is a random process of disabling neurons in a layer with chance p. This will make certain neurons feel they are 'wrong' in each iteration - basically, you are making neurons feel 'wrong' about their output so that they rely less on the outputs of the nodes in the previous layer. This is a method of regularization and reduces overfitting.
However, there are two main reasons you should not use dropout to test data:
Dropout makes neurons output 'wrong' values on purpose
Because you disable neurons randomly, your network will have different outputs every (sequences of) activation. This undermines consistency.
However, you might want to read some more on what validation/testing exactly is:
Training set: a set of examples used for learning: to fit the parameters of the classifier In the MLP case, we would use the training set to find the “optimal” weights with the back-prop rule
Validation set: a set of examples used to tune the parameters of a classifier In the MLP case, we would use the validation set to find the “optimal” number of hidden units or determine a stopping point for the back-propagation algorithm
Test set: a set of examples used only to assess the performance of a fully-trained classifier In the MLP case, we would use the test to estimate the error rate after we have chosen the final model (MLP size and actual weights) After assessing the final model on the test set, YOU MUST NOT tune the model any further!
Why separate test and validation sets? The error rate estimate of the final model on validation data will be biased (smaller than the true error rate) since the validation set is used to select the final model After assessing the final model on the test set, YOU MUST NOT tune the model any further!
source : Introduction to Pattern Analysis,Ricardo Gutierrez-OsunaTexas A&M University, Texas A&M University (answer)
So even for validation, how would you determine which nodes you remove if the nodes have a random probability of being disactivated?
Dropout is a method of making bagging practical for ensembles of very many large neural networks.
Along the same line we may remember that using the following false explanation:
For the new data, we can predict their classes by taking the average of the results from all the learners:
Since N is a constant we can just ignore it and the result remains the same, so we should disable dropout during validation and testing.
The true reason is much more complex. It is because of the weight scaling inference rule:
We can approximate p_{ensemble} by evaluating p(y|x) in one model: the model with all units, but with the weights going out of unit i multiplied by the probability of including unit i. The motivation for this modification is to capture the right expected value of the output from that unit. There is not yet any theoretical argument for the accuracy of this approximate inference rule in deep nonlinear networks, but empirically it performs very well.
When we train the model using dropout(for example for one layer) we zero out some outputs of some neurons and scale the others up by 1/keep_prob to keep the expectation of the layer almost the same as before. In the prediction process, we can use dropout but we can only get different predictions each time because we drop the values out randomly, then we need to run the prediction many times to get the expected output. Such a process is time-consuming so we can remove the dropout and the expectation of the layer remains the same.
Reference:
Difference between Bagging and Boosting?
7.12 of Deep Learning
Simplest reason can be, during prediction(test, validation or after production deployment) you want to use the capability of each and every learned neurons and really don't like to skip some of them randomly.
Thats the only reason we set probability as 1 during testing.
There is a Bayesian technique called Monte Carlo dropout in which the dropout would be not disabled during testing. The model will run several times with the same dropout rate(or in one go as a batch), and the mean(line 6 depicted below) and variance(line 7 depicted below) of the results will be calculated to determine the uncertainty.
Here is Uber's application to quantify uncertainty:
Short answer:
Dropouts to bring down over fitting in the training data. They are used as a regularization parameters. So if you have high variance (i.e. look at the difference between training set and validation set accuracy for this) then use drop out on training data, as it won't be good enough to apply dropout on test and validation data as you haven't been sure about the neurons which are going to shut off hence laying off the importance of random neurons which can be important.

how can I avoid recurrent neural network from overtraining?

I'm using a recurrent neural network for prediction. how should i avoid from overtraining? I've used gradient decent for parameter updates. the following figure shows the training and validation error for 500 epoch training and validation error
There are a few strategies that help against overfitting:
Regularization. You can add dropout or use L1 or L2 regularization on the weights.
Use a smaller model (i.e. have less parameters to fit)
Add random noise to your data
Add training data (if you can't get more natural data, you can perhaps generate it synthetically)
Use early stopping. This just means stopping training before it starts to overfit.
A typical approach is this: Use a big model, and use dropout and maybe also L2. Try a few different values for the amount of regularization, and use the best one. Monitor validation error as you train, and pick the model where it is the lowest (early stopping).

Accuracy of Neural network Output-Matlab ANN Toolbox

I'm relatively new to Matlab ANN Toolbox. I am training the NN with pattern recognition and target matrix of 3x8670 containing 1s and 0s, using one hidden layer, 40 neurons and the rest with default settings. When I get the simulated output for new set of inputs, then the values are around 0 and 1. I then arrange them in descending order and choose a fixed number(which is known to me) out of 8670 observations to be 1 and rest to be zero.
Every time I run the program, the first row of the simulated output always has close to 100% accuracy and the following rows dont exhibit the same kind of accuracy.
Is there a logical explanation in general? I understand that answering this query conclusively might require the understanding of program and problem, but its made of of several functions to clearly explain. Can I make some changes in the training to get consistence output?
If you have any suggestions please share it with me.
Thanks,
Nishant
Your problem statement is not clear for me. For example, what you mean by: "I then arrange them in descending order and choose a fixed number ..."
As I understand, you did not get appropriate output from your NN as compared to the real target. I mean, your output from NN is difference than target. If so, there are different possibilities which should be considered:
How do you divide training/test/validation sets for training phase? The most division should be assigned to training (around 75%) and rest for test/validation.
How is your training data set? Can it support most scenarios as you expected? If your trained data set is not somewhat similar to your test data sets (e.g., you have some new records/samples in the test data set which had not (near) appear in the training phase, it explains as 'outlier' and NN cannot work efficiently with these types of samples, so you need clustering approach not NN classification approach), your results from NN is out-of-range and NN cannot provide ideal accuracy as you need. NN is good for those data set training, where there is no very difference between training and test data sets. Otherwise, NN is not appropriate.
Sometimes you have an appropriate training data set, but the problem is training itself. In this condition, you need other types of NN, because feed-forward NNs such as MLP cannot work with compacted and not well-separated regions of data very well. You need strong function approximation such as RBF and SVM.

Matlab neural network testing

I have created a neural network and the performance is good. By using nprtool, we are allow to test the network with an input data and target data. Here is my question, what is the purpose of testing a neural network with target data provided? Isn't it testing should not hav e target data so that we can know how well can the trained neural network perform without target data is given? Hope someone will respond to this, thanks =)
I'm not familiar with nprtool, but I suspect it would give the input data to your neural network, and then compare your NN's output data with the target data (and compute some kind of success rate based on that).
So your NN will never see the target data, it's just used to measure the performance.
It's like the "teacher's edition" of the exercise books in school. The student (i.e. the NN) doesn't have the solutions, but her/his answers will be compared against them by the teacher (i.e. nprtool). (Okay, the teacher probably/hopefully knows the subject, but you get the idea.)
The "target" data t is the desired y of y=net(x) used as example to train the network.
What nprtool do is to divide the training set into three groups: the training set, the validation set and the test set.
The first one is used to actually update the network.
The second one is used to determine the performances of the net (note: this set is NOT used in any way to update the network): as the NN "learns" the error (as difference between the t and net(x)) over the validation set decreases. The trend will eventually stop or even reverse: this phenomena is called "overfitting", which means the NN is now chasing the training set, "memorizing" it at the cost of the ability to generalize (meaning: to perform well with unseen data). So the purpose of this validation set is to determine when to stop the training before the NN starts overfitting. This should answer your question.
Finally third set is for external testing, to leave you a set of data untouched by the training procedure.
Even though the total data set [training, validation and testing] are inputs to the training algorithm, the testing data is in no way used to design (i.e., train and validate) the net
total = design + test
design = train + validate
The training data is used to estimate weights and biases
The validation data is used to monitor the design performance on nontraining data. REGARDLESS OF THE PERFORMANCE ON TRAINING DATA, if validation performance degrades continuously for 6 (default) epochs, training is terminated (VALIDATION STOPPING).
This mitigates the dreaded phenomenon of OVERTRAINING AN OVERFIT NET where performance on nontraining data degrades even if the training set performance is improving.
An overfit net has more unknown weights and biases than training equations, thereby allowing an infinite number of solutions. A simple example of overfitting with two unknowns but only one equation:
KNOWN: a, b, c
FIND: unique x1 and x2
USING: a * x1 + b * x2 = c
Hope this helps.
Greg