Validation Loss and Accuracy in LSTM Networks with Keras - neural-network

I run the example code for LSTM networks that uses imdb dataset in Keras. One can find the code in the following link.
imdb_lstm.py
My problem is that as code progresses the training loss decreases and training accuracy increases as expected but validation accuracy fluctuates in an interval and validation loss increases to a high value. I attach a part of the log of the training phase below. Even I observe that when training loss is very small (~ 0.01-0.03) sometimes it increases in the next epoch and then it decreases again. What I mention can be seen in epochs 75-77. But in general it decreases.
What I expect is that training accuracy always increases up to 0.99-1 and training loss always decreases. Moreover, the validation accuracy should start from maybe 0.4 and raise to for example 0.8 in the end. If validation accuracy does not improve over epochs what is the point of waiting during epochs? Also the test accuracy is close 0.81 at the end.
I also tried with my own data and came up with same situation. I processed my data in a similar way. I mean my training, validation and test points are processed in same logic as the ones in this example code.
Besides, I did not understand how this code represents the whole sentence after obtaining the outputs from LSTM for each word. Does it conduct mean or max pooling or does it take only the last output from LSTM layer before giving it to a logistic regression classifier?
Any help would be appreciated.
Using Theano backend.
Loading data...
25000 train sequences
25000 test sequences
Pad sequences (samples x time)
X_train shape: (25000, 80)
X_test shape: (25000, 80)
Build model...
Train...
Train on 22500 samples, validate on 2500 samples
Epoch 1/100
22500/22500 [==============================] - 236s - loss: 0.5438 - acc: 0.7209 - val_loss: 0.4305 - val_acc: 0.8076
Epoch 2/100
22500/22500 [==============================] - 237s - loss: 0.3843 - acc: 0.8346 - val_loss: 0.3791 - val_acc: 0.8332
Epoch 3/100
22500/22500 [==============================] - 245s - loss: 0.3099 - acc: 0.8716 - val_loss: 0.3736 - val_acc: 0.8440
Epoch 4/100
22500/22500 [==============================] - 243s - loss: 0.2458 - acc: 0.9023 - val_loss: 0.4206 - val_acc: 0.8372
Epoch 5/100
22500/22500 [==============================] - 239s - loss: 0.2120 - acc: 0.9138 - val_loss: 0.3844 - val_acc: 0.8384
....
....
Epoch 75/100
22500/22500 [==============================] - 238s - loss: 0.0134 - acc: 0.9868 - val_loss: 0.9045 - val_acc: 0.8132
Epoch 76/100
22500/22500 [==============================] - 241s - loss: 0.0156 - acc: 0.9845 - val_loss: 0.9078 - val_acc: 0.8211
Epoch 77/100
22500/22500 [==============================] - 235s - loss: 0.0129 - acc: 0.9883 - val_loss: 0.9105 - val_acc: 0.8234

When to stop training: it's an usual way to stop training when a some metric computed on validation data starts to grow. This an usual indicator of overfitting. But please notice that you are using a dropout technique - which results in training slightly different model during every epochs - that's why you should apply some kind of patience - and stop training when a phenomena like this occurs in a several consecutive epochs.
The reason of fluctuations: the same as in first point - you are using a dropout technique which introduces some sort of randomness to your network. This is in my opinion the main reason of the fluctuations observed.
What Keras models take as an inputs to a Dense layer: if you study carefully a documentation of a LSTM/RNN layer you will notice return_sequences=False set as a default argument. This means that only the last output from a processed sequence is taken as an input to the following layer. You could change that using 1-D Convolutions.

Related

The time for each epoch in my neural network is 0 sec

While running my model,the time for each epoch is 0 sec, does that mean there is something wrong with my modelenter image description here
Probably training is just fast.
These epoch times are very possible when using GPU's. Of course, only on smaller datasets/architectures.
Say you are training MNIST with fairly straightforward network (LeNet alike CNN), these times are pretty normal.
From scanning the loss/acc, something is happening as the numbers are not constant.

Keras high loss, not decreasing with each epoch

I am learning NN and Keras. My test data is something like this:
Result, HomeWinPossibility, DrawPossibility, AwayWinPossibility
[['AwayWin' 0.41 0.28 0.31]
['HomeWin' 0.55 0.25 0.2]
['AwayWin' 0.17 0.21 0.62]
.....
Here is my model:
model = Sequential()
model.add(Dense(16, input_shape=(3,)))
model.add(Activation('sigmoid'))
model.add(Dense(8, activation='relu'))
model.add(Dense(3))
model.add(Activation('softmax'))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=["accuracy"])
model.fit(train_X, train_y_ohe, epochs=100, batch_size=1, verbose=1);
The output from fit is:
Epoch 1/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9151 - acc: 0.5737
Epoch 2/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9181 - acc: 0.5474
Epoch 3/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9111 - acc: 0.5526
Epoch 100/100
190/190 [==============================] - 1s 3ms/step - loss: 0.9130 - acc: 0.5579
So why the loss is not going down as some NN tutorials I read? Is it because the data I provided are just noises, so NN can't find any clue or something not right with my model?
As the acc is always around 0.55 (so 50%), does it mean the NN actually achieved better than random guessing (> 33%)? If this is true, why it achieved accuracy 0.57 at the first epoch?
So why the loss is not going down as some NN tutorials I read?
It might be many reasons - all depending on your data. Here are things you could adjust:
You have a very low batch size. Although some data might actually respond to this, I think that a batch size of 1 would be too small in most cases - without getting started on the redundantness of the structure you show when you use batch size 1. Batch size is very dependent on how much, and what kind of, data you have, but try somewhere around 20-30 if you have sufficient data.
Try different activation functions (but always have softmax or sigmoid in the last layer because you want numbers between 0 and 1).
Increase the number of units in the first and/or second layer (if you have enough data).
Try to set the learning rate (lr) for the Adam optimizer: model.compile(optimizer=keras.optimizers.Adam(lr=0.001), ...)
Is it because the data I provided are just noises
If your data is pure noise across classes, then very probably, given that there are roughly the same number of datapoints in each class, the accuracy would be around 33%, since it would essentially just guess at random.
As the acc is always around 0.55(so 50%). does it mean the NN actually achieved better than random guessing (33%)?
Not necessarily. The accuracy is a measure of how many classes that were correctly classified. Say that the validation data (conventionally the part of the dataset that the accuracy is calculated on) only contains data from one class. Then if the NN only classifies everything to that one class, the validation data would have 100% accuracy!
That means if you don't have the same number of datapoints from each class, accuracy is not to be trusted alone! A much better measure in cases where you have unbalanced datasets is e.g. the AUC (Area under the ROC curve) or the F1 score, which takes false positives into account as well.
I would recommend that you look into the theory behind this. Just blindly running around will probably be very annoying because you'd have a very hard time getting good results. And even if you got good results, they might often not be as good as you think. One place to read would be Ian Goodfellow's book on deep learning.

Train accuracy drops in some epochs

I'm training a ResNet (CIFAR-10 dataset) and train accuracy is mostly (in 95% epochs) increasing, but sometimes it drops 5-10% and then it starts increasing again.
Here is an example:
Epoch 45/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0323 - acc: 0.9948 - val_loss: 1.6562 - val_acc: 0.7404
Epoch 46/100
40000/40000 [==============================] - 52s 1ms/step - loss: 0.0371 - acc: 0.9932 - val_loss: 1.6526 - val_acc: 0.7448
Epoch 47/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0266 - acc: 0.9955 - val_loss: 1.6925 - val_acc: 0.7426
Epoch 48/100
40000/40000 [==============================] - 50s 1ms/step - loss: 0.0353 - acc: 0.9940 - val_loss: 2.2682 - val_acc: 0.6496
Epoch 49/100
40000/40000 [==============================] - 50s 1ms/step - loss: 1.6391 - acc: 0.4862 - val_loss: 1.2524 - val_acc: 0.5659
Epoch 50/100
40000/40000 [==============================] - 52s 1ms/step - loss: 0.9220 - acc: 0.6830 - val_loss: 0.9726 - val_acc: 0.6738
Epoch 51/100
40000/40000 [==============================] - 51s 1ms/step - loss: 0.5453 - acc: 0.8165 - val_loss: 1.0232 - val_acc: 0.6963
I've quit execution after this, but this was my second run and in first, same thing happened and after some time it got back to 99%.
Batch is 128 so I guess this is not a problem. I haven't change learning rate or any other Adam parameters, but I guess that's also not an issue since accuracy is increasing most of the time.
So, why are those sudden drops happening?
Since training and validation loss and accuracy all increase it looks like your optimization algorithm has temporarily overshot the downhill part of the loss function that it was trying to follow.
Remember gradient descent and related methods calculate the gradient at a point and then use that (and sometimes some additional data) to guess the direction and distance to move. This is not always perfect and sometimes it will go too far and end up further uphill again.
If your learning rate is aggressive you will see this every now and then, but you might still converge faster than with a smaller learning rate. You can experiment with different learning rates, but I would not be concerned unless your loss starts to diverge.

Decrease total-loss in Deep neural network [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 5 years ago.
Improve this question
I use tflearn.DNN to build a deep neural network:
# Build neural network
net = tflearn.input_data(shape=[None, 5], name='input')
net = tflearn.fully_connected(net, 64, activation='sigmoid')
tflearn.batch_normalization(net)
net = tflearn.fully_connected(net, 32, activation='sigmoid')
tflearn.batch_normalization(net)
net = tflearn.fully_connected(net, 16, activation='sigmoid')
tflearn.batch_normalization(net)
net = tflearn.fully_connected(net, 8, activation='sigmoid')
tflearn.batch_normalization(net)
# activation needs to be softmax for classification.
# default loss is cross-entropy and the default metric is accuracy
# cross-entropy + accuracy = categorical network
net = tflearn.fully_connected(net, 2, activation='softmax')
sgd = tflearn.optimizers.SGD(learning_rate=0.01, lr_decay=0.96, decay_step=100)
net = tflearn.regression(net, optimizer=sgd, loss='categorical_crossentropy')
model = tflearn.DNN(net, tensorboard_verbose=0)
I tried many things, but all the time the total loss is around this value:
Training Step: 95 | total loss: 0.68445 | time: 1.436s
| SGD | epoch: 001 | loss: 0.68445 - acc: 0.5670 | val_loss: 0.68363 - val_acc: 0.5714 -- iter: 9415/9415
What can I do to decrease the total loss and make the accuracy get higher?
Many aspects can be considered to improve the network performance, including the datasets and the network.
Just by the network structure you pasted, it is difficult to give a clear way to increase its accuracy without more info about datasets and the target you want to get. But the following are some useful practices may help you to debug / improve the network:
1. About the datasets
Is the datasets balanced with distortions?
Get more training data .
Add data augmentation if possible.
Normalising data.
Feature engineering.
2. About the network
Is the network size is too small / large?
Check overfitting or underfitting by train history, then chose the best epoch size.
Try initialise weights with different initialization scheme.
Try different activation functions, loss function, optimizer.
Change layers number and units number.
Change batch size.
Add dropout layer.
And for more deeply analyse, the following articles may be helpful to you:
How To Improve Deep Learning Performance
How to debug neural networks. Manual

keras loss jumps to zero randomly at the start of a new epoch

I'm training a network which has multiple losses and both creating and feeding the data into my network using a generator.
I've checked the structure of the data and it looks fine generally and it also trains pretty much as expected the majority of the time, however at a random epoch almost every time, the the training loss for every prediction suddenly jumps from say
# End of epoch 3
loss: 2.8845
to
# Beginning of epoch 4
loss: 1.1921e-07
I thought it could be the data, however, from what I can tell the data is generally fine and it's even more suspicious because this will happen at a random epoch (could be because of a random data point chosen during SGD?) but will persist throughout the rest of training. As in if at epoch 3, the training loss decreases to 1.1921e-07 then it will continue this way in epoch 4, epoch 5, etc.
However, there are times when it reaches epoch 5 and hasn't done this yet and then might do it at epoch 6 or 7.
Is there any viable reason outside of the data that could cause this? Could it even happen that a few fudgy data points causes this so fast?
Thanks
EDIT:
Results:
300/300 [==============================] - 339s - loss: 3.2912 - loss_1: 1.8683 - loss_2: 9.1352 - loss_3: 5.9845 -
val_loss: 1.1921e-07 - val_loss_1: 1.1921e-07 - val_loss_2: 1.1921e-07 - val_loss_3: 1.1921e-07
The next epochs after this all have trainig loss 1.1921e-07
Not entirely sure how satisfactory this is as an answer but my findings seem to show that using multiple categorical_crossentropy loss's together seems to result in a super unstable network? Swapping this out for other loss functions fixes the problem with the data remaining unchanged.