I am learning and experimenting with neural networks and would like to have the opinion from someone more experienced on the following issue:
When I train an Autoencoder in Keras ('mean_squared_error' loss function and SGD optimizer), the validation loss is gradually going down. and the validation accuracy is going up. So far so good.
However, after a while, the loss keeps decreasing but the accuracy suddenly falls back to a much lower low level.
Is it 'normal' or expected behavior that the accuracy goes up very fast and stay high to fall suddenly back?
Should I stop training at the maximum accuracy even if the validation loss is still decreasing? In other words, use val_acc or val_loss as metric to monitor for early stopping?
See images:
Loss: (green = val, blue = train]
Accuracy: (green = val, blue = train]
UPDATE:
The comments below pointed me in the right direction and I think I understand it better now. It would be nice if someone could confirm that following is correct:
the accuracy metric measures the % of y_pred==Y_true and thus only make sense for classification.
my data is a combination of real and binary features. The reason why the accuracy graph goes up very steep and then falls back, while the loss continues to decrease is because around epoch 5000, the network probably predicted +/- 50% of the binary features correctly. When training continues, around epoch 12000, the prediction of real and binary features together improved, hence the decreasing loss, but the prediction of the binary features alone, are a little less correct. Therefor the accuracy falls down, while the loss decreases.
If the prediction is real-time or the data is continuous rather than discrete, then use MSE(Mean Square Error) because the values are real time.
But in the case of Discrete values (i.e) classification or clustering use accuracy because the values given are either 0 or 1 only. So, here the concept of MSE will not applicable, rather use accuracy= no of error values/total values * 100.
Related
I have build a custom Skin cancer classification system using Keras(2.2.2),python(3.6),tensorflow(1.9.0).
Here is the training accuracy,validation accuracy and validation loss graph I am getting (epochs is given in the x axis).
Is it safe to assume after the epoch 640 my model is over fitting ?.
Can we say that the we have reached global minima and just oscillating there ?
It doesn't look it's over fitting because there is not a big difference between the training and validation accuracy. Assuming network has trained fully still it can get stuck in local minima. Try experimenting with different optimizers and change hyperparameters.
But one thing i want to point out is that accuracy is not a good metric to evaluate your model.
Check this link for more details: https://stats.stackexchange.com/questions/312780/why-is-accuracy-not-the-best-measure-for-assessing-classification-models
Yes, at 640 training is definitely going wrong. From your graph, you have most likely been on the wrong track since epoch 200. By using future knowledge, you can retrain your your set,and at epoch 200, give your set a Jitter slightly greater than 1/2 the bounce that happens later(around 400 epochs),to prevent falling into local minima, and continue for ~100 - 300 epochs. By adjusting earlier, you give the model a chance to adapt to unknowns.
I have two learned sklearn.tree.tree.DecisionTreeClassifiers. Both are trained with the same training data. Both learned with different maximum depths for the decision trees. The depth for the decision_tree_model was 6 and the depth for the small_model was 2. Besides the max_depth, no other parameters were specified.
When I want to get the accuracy on the training data of them both like this:
small_model_accuracy = small_model.score(training_data_sparse_matrix, training_data_labels)
decision_tree_model_accuracy = decision_tree_model.score(training_data_sparse_matrix, training_data_labels)
Surprisingly the output is:
small_model accuracy: 0.61170212766
decision_tree_model accuracy: 0.422496238986
How is this even possible? Shouldn't a tree with a higher maximum depth always have a higher accuracy on the training data when learned with the same training data? Is it maybe that score function, which outputs the 1 - accuracy or something?
EDIT:
I just tested it with even higher maximum depth. The value returned becomes even lower. This hints at it being 1 - accuracy or something like that.
EDIT#2:
It seems to be a mistake I made with working with the training data. I thought about the whole thing again and concluded: "Well if the depth is higher, the tree shouldn't be the reason for this. What else is there? The training data itself. But I used the same data! Maybe I did something to the training data in between?"
Then I checked again and there is a difference in how I use the training data. I need to transform it from an SFrame into a scipy matrix (might have to be sparse too). Now I made another accuracy calculation right after fitting the two models. This one results in 61% accuracy for the small_model and 64% accuracy for the decision_tree_model. That's only 3% more and still somewhat surprising, but at least it's possible.
EDIT#3:
The problem is resolved. I handled the training data in a wrong way and that resulted in different fitting.
Here is the plot of accuracy after fixing the mistakes:
This looks correct and would also explain why the assignment creators chose to choose 6 as the maximum depth.
Shouldn't a tree with a higher maximum depth always have a higher
accuracy when learned with the same training data?
No, definitely not always. The problem is you're overfitting your model to your training data in fitting a more complex tree. Hence, the lower score as increase the maximum depth.
I am using mxnet to train a 11-class image classifier. I am observing a weird behavior training accuracy was increasing slowly and went upto 39% and in next epoch it went down to 9% and then it stays close to 9% for rest of the training.
I restarted the training with saved model (with 39% training accuracy) keeping all other parameter same . Now training accuracy is increasing again. What can be the reason here ? I am not able to understand it . And its getting difficult to train the model this way as it requires me to see training accuracy values constantly.
learning rate is constant at 0.01
as you can see your late accuracy is near random one. there is 2 common issue in this kind of cases.
your learning rate is high. try to lower it
The error (or entropy) you are trying to use is giving you NaN value. if you are trying to use entropies with log functions you must use them precisely.
It is common during training of neural networks for accuracy to improve for a while and then get worse -- in general this is caused by over-fitting. It's also fairly common for the network to "get unlucky" and get knocked into a bad part of parameter space corresponding to a sudden decrease in accuracy -- sometimes it can recover from this quickly, but sometimes not.
In general, lowering your learning rate is a good approach to this kind of problem. Also, setting a learning rate schedule like FactorScheduler can help you achieve more stable convergence by lowering the learning rate every few epochs. In fact, this can sometimes cover up mistakes in picking an initial learning rate that is too high.
I faced the same problem.And I solved it by use (y-a)^a loss function instead of the cross-entropy function(because of log(0)).I hope there is better solution for this problem.
These problems often come up. I observed that this may happen due to one of the following reasons:
Something returning NaN
The inputs of the network are not as expected - many modern frameworks do not raise errors in some of such cases
The model layers get incompatible shapes at some point
It happened probably because 0log0 returns NaN.
You might avoid it by;
cross_entropy = -tf.reduce_sum(labels*tf.log(tf.clip_by_value(logits,1e-10,1.0)))
Through all training process, accuracy is 0.1. What am I doing wrong?
Model, solver and part of log here:
https://gist.github.com/yutkin/3a147ebbb9b293697010
Topology in png format:
P.S. I am using the latest version of Caffe and g2.2xlarge instance on AWS.
You're working on CIFAR-10 dataset which has 10 classes. When the training of a network commences, the first guess is usually random due to which your accuracy is 1/N, where N is the number of classes. In your case it is 1/10, i.e., 0.1. If your accuracy stays the same over time it implies that your network isn't learning anything. This may happen due to a large learning rate. The basic idea of training a network is that you calculate the loss and propagate it back. The gradients are multiplied with the learning rate and added to the current weights and biases. If the learning rate is too big you may overshoot the local minima every time. If it is too small, the convergence will be slow. I see that your base_lr here is 0.01. As far as my experience goes, this is somewhat large. You may want to keep it at 0.001 in the beginning and then go on reducing it by a factor of 10 whenever you observe that the accuracy is not improving. But then anything below 0.00001 usually doesn't make much of a difference. The trick is to observe the progress of the training and make parameter changes as and when required.
I know the thread is quite old but maybe my answer helps somebody. I experienced the same problem with an accuracy like a random guess.
What helped was to set the number of outputs of the last layer before the accuracy layer to the number of labels.
In your case that should be the ip2 layer. Open the model definition of your net and set num_outputs to the number of labels.
See Section 4.4 for more information: A Practical Introduction to Deep Learning with Caffe and Python
I've developed a "Pong" style game which effectively has a ball at the bottom of the screen and bouncy walls on the left and right and a sticky wall on the top. It randomly chooses a point on the bottom (on a straight horizontal line) and a random angle, bounces off the side walls, and hits the top wall. This is repeated a 1000 times and each time, the x-value of the launch position, the launch angle and the final x-value of the position it collides with on the top wall.
This gives me 2 inputs - x-value of launch and launch angle and 1 output - x-value of final position. I tried using a multilayer perceptron with 2 input nodes, 2 hidden nodes (1 layer) and 1 output node. However it converges upto a point ~20 and then tapers off. Here's what I've tried and none of them helped, either the error never converges or it starts diverging:
Transform inputs and output to be between 0 and 1
Transform inputs and output to be between -1 and 1
Increase number of hidden layers
Increase number of nodes in hidden layer
Convert the launch position, launch angle and final position into 0s and 1s resulting in ~750+175 inputs and ~750 outputs - no convergence
So, after spending all night and morning and making my brain and body revolt against me, I'm hoping someone can help me identify the problem here. Is this a task that's just not solvable by a neural network or am I doing something wrong?
PS: I'm using the online version of Neuroph and not coding my own procedure. At least this will help me avoid issues in implementation
If it doesn't minimize the training error, that's most likely a bug in the implementation. If you're measuring the accuracy on a held-out test set, on the other hand, there's nothing surprising about the error going up after a while.
As to the formulation, I think with sufficient amount of training data and sufficiently long training time, a sufficiently complex NN can learn the mapping whether you binarize the input or not (provided the implementation you use supports non-binary input and output). I have only a vague idea of what "sufficient" means in the above sentence, but I'd venture a guess that 1000 samples won't do. Note also that the more complex the network, the more data it will generally need to estimate the parameters.
To eliminate potential implementation issues in Neuroph, I'd suggest trying the exact same process (Multi-Layer Perceptron, same parameters, same data, etc.) but use Weka instead.
I've used the MLP in Weka before with success, so I can verify that this implementation works correctly. I know Weka has a fairly high-penetration in the academic community and its fairly well vetted, but I'm not sure about Neuroph since its newer. If you get the same results as Neuroph, then you know the issue is in your data or neural net topology or configuration.
Qnan brings up a good point - what exactly is the error you are measuring? To really determine why the training error isn't converging towards zero, you need to determine what exactly it is that the error represents.
Also, how many epochs (i.e., number of iterations) is the neural net running in training before it stops converging?
In Weka, if I recall correctly you can set the training to execute either until the error reaches a certain value or for a certain number of epochs. Looks like Neuroph is the same way, from a quick look.
If you're limiting the number of epochs, try bumping up the number to something significantly higher to give the network more iterations to converge.