I have built an LSTM network on Matlab. I tune the hyperparameters using Bayesian Optimization and Rolling Cross Validation. When I train the network on the optimized hyperparameters, I get a different R2 (coefficient of determination) every time, due to the weights being initialized randomly. I am wondering how I should report my results. SHould I fix a random seed during hyperparameter tuning and use this same seed when I train on the full training set? Thank you
Related
I trained a Neural Network with a GA and with Backpropagation. The GA finds suitable weights for the training data but performs poorly on the test data. If I train the NN with BackPropagation, it performs much better on the test data even though the training error isn't much smaller than for the GA trained version. Even when I use the weights obtained by the GA as initial weights for Backpropagation, the NN performs worse on the test data than using only Backpropagation for training. Can anyone tell me where I could have made a mistake?
I suggest you read something about overfitting. In short you will be excelent at training set but poor at testing set(because NN follows anomaly and uncertainity and datas). Task of NN is generalize, but GA only perfect minimize error in training set(to be fair, this is GA task).
There are some methods how to deal with overfitting. I suggest you use validation set. First step is division your data into the three sets. Training testing and validation. Method is simple, you will train your NN with GA to minimalize error on training set, but you also run your NN on validation set, only run, not train. Error of network decrease on training set, but error should also decrease at validation set. So if error decrease at training set, but start increase at validation set, you must stop with learning(please don't stop at first iterations).
Hope it will be helpful.
I have encountered a similar problem, and the choice of the initial values of the neural network does not seem to affect the final classification accuracy. I used the feedforwardnet() function in matlab to compare the two cases. One is direct training, and the program gives random initial weights and bias values. One is to find the appropriate initial weights values and bias values through the GA algorithm, and then assign them to the neural network, and then start training. However, the latter approach does not improve the accuracy of neural network classification.
I mostly understand how k-fold cross-validation works and have begun implementing it into my MATLAB scripts, however I have two questions.
When using it to select network features (hidden units, weight decay prior and no. iterations in my case). Should I re-intialise the weights after each 'fold', or should I just feed my next training fold into the already trained network (it has weights that have been optimised for the previous fold) ?
It seems that doing the latter should give lower errors as the previous fold of data will be a good approximation of the next, and so the weights will be closer than those initialised randomly from a gaussian distribution.
Additionally, having validated the network using k-fold validation, and chosen network hyper parameters etc., and I want to start using the network, am I right in thinking that I should stop using k-fold validation and just train once, using all of the available data?
Many thanks for any help.
Yes you should reinitialize the weights after each fold, in order to start with a "blank" network. If you don't do this, then each fold will "leak" into each other, and that's not what K-Fold CV is supposed to do.
After finding the best hyperparameters, yes, you can train it with all the available data. Just remember to keep some hold-out testing data for final testing.
I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.
I'm using a recurrent neural network for prediction. how should i avoid from overtraining? I've used gradient decent for parameter updates. the following figure shows the training and validation error for 500 epoch training and validation error
There are a few strategies that help against overfitting:
Regularization. You can add dropout or use L1 or L2 regularization on the weights.
Use a smaller model (i.e. have less parameters to fit)
Add random noise to your data
Add training data (if you can't get more natural data, you can perhaps generate it synthetically)
Use early stopping. This just means stopping training before it starts to overfit.
A typical approach is this: Use a big model, and use dropout and maybe also L2. Try a few different values for the amount of regularization, and use the best one. Monitor validation error as you train, and pick the model where it is the lowest (early stopping).
I am experimenting with Elman and/or Jordan ANN's using the Encog framework. I am trying to code my own but studying how Encog has it implemented. I see how the backpropagation through time updates the weights, but how do the context neurons get updated? There values seem to fluctuate somewhat randomly as the output of the neural network is calculated. How is it that these values actually allow the simple recurrent neural network to actually recognize patterns in input data over time?
The context neurons neuron values themselves are not updated as training progresses. The weights between them and the next layer ARE updated during training. The context values will be updated as the neural network runs.