Can dropout increases training data performance? - neural-network

I am training a neural network with dropout. It happens that as I decrease dropout from 0.9 to 0.7, the loss (cross-validation error) also decreases for the training data data. I noticed also that accuracy increases as I reduce dropout parameter.
It seems odd to me. Does it make sense?

Dropout is a regularization technique. You should use it only to reduce variance (validation performance vs training performance).It is not intended to reduce the bias, and you should not use it in this way. it is very misleading.
Probably the reason for which you see this behavior is that you use a very high value for dropout. 0.9 means you neutralize too many neurons. It makes sense that once you put there 0.7 instead, the network has higher neurons to use while learning on training set. So the performance will increase for lower values.
You usually should see the training performance dropping a bit, while increasing the performance on the validation set (if you do not have one, at least on the test set). This is the desired behavior you are looking for, when using dropout. The current behavior you get is because if the very high values for dropout.
Start with 0.2 or 0.3 and compare the bias vs. variance in order to get a good value for dropout.
My clear recommendation: don't use it to improve bias, but to reduce variance (error on validation set).
In order to fit better on the training set I recommend :
find a better architecture (or change the number of neurons per
layer)
try different optimizers
hyperparameter tunning
maybe train the network a bit longer
Hopefully this helps !

Dropout works by probabilistically removing, or “dropping out,” inputs to a layer, which may be input variables in the data sample or activations from a previous layer. It has the effect of simulating a large number of networks with a very different network structure and, in turn, making nodes in the network generally more robust to the inputs.
With dropout (dropout rate less than some small value), the accuracy will gradually increase and loss will gradually decrease first(That is what is happening in your case).
When you increase dropout beyond a certain threshold, it results in the model not being able to fit properly. Intuitively, a higher dropout rate would result in a higher variance to some of the layers, which also degrades training.
What you should always remember is that Dropout is like all other forms of regularization in that it reduces model capacity. If you reduce the capacity too much, it is sure that you will get bad results.
Hope this may help you.

Related

Is it possible that accuracy get reduced while increasing the no of epochs?

I am training a DNN in MATLAB , while optimizing my network, I am observing a decrement in accuracy while increasing the epochs. Is it possible?
The loss values in on the other hand decreases during training while increasing epochs. Please guide.
tldr; absolutely.
When entire training dataset is seen once by the model (feed forwarded once), it's termed as 1 epoch.
The below graph shows the general behaviour of accuracy with the number of epochs. Training on more number of epochs can result in low accuracy on validation, even though loss will continue to reduce (training accuracy will be high). This is termed as overfitting.
No. of epochs to train also a hyperparameter that needs fine tuning.
It is absolutely possible:
Especially when you are training in batches
When your learning rate is too high

Caffe: why Dropout layer exists also in Deploy (testing)?

I understand that Dropout is for efficient training, avoiding over-fitting and speed-up learning. However, I do not understand why i see it also in the deploy (testing)?
Should I set dropout_ratio: 1.0 is testing?
TL;DR
Don't touch dropout layer. Caffe knows it should do nothing during inference.
"Dropout" is indeed a very powerful addition to the learning process, and it seemingly has no impact at inference time.
However, if you consider a naive implementation where at train time one only set some of the neurons to zero, at test time you must compensate for activating all neurons by scaling the activations (to get the same overall "strength" of the signal). In this case inference-time "Dropout" becomes a simple scale layer (by known and fixed scale factor).
Fortunately, more thoughtful implementation does this scaling as part of the training (that is, setting some of the neurons to zero, and simultaneously scaling up the rest of the neurons be a predefined scale factor), this way, at inference time "Dropout" layer does absolutely [nothing][3].
To learn more about "Dropout" contribution to the stability of the training and its impact on the generalization capacity of the net you can read sec 7.12 of Bengio's deep learning book.

TensorFlow: Binary classification accuracy

In the context of a binary classification, I use a neural network with 1 hidden layer using a tanh activation function. The input is coming from a word2vect model and is normalized.
The classifier accuracy is between 49%-54%.
I used a confusion matrix to have a better understanding on what’s going on. I study the impact of feature number in input layer and the number of neurons in the hidden layer on the accuracy.
What I can observe from the confusion matrix is the fact that the model predict based on the parameters sometimes most of the lines as positives and sometimes most of the times as negatives.
Any suggestion why this issue happens? And which other points (other than input size and hidden layer size) might impact the accuracy of the classification?
Thanks
It's a bit hard to guess given the information you provide.
Are the labels balanced (50% positives, 50% negatives)? So this would mean your network is not training at all as your performance corresponds to the random performance, roughly. Is there maybe a bug in the preprocessing? Or is the task too difficult? What is the training set size?
I don't believe that the number of neurons is the issue, as long as it's reasonable, i.e. hundreds or a few thousand.
Alternatively, you can try another loss function, namely cross entropy, which is standard for multi-class classification and can also be used for binary classification:
https://www.tensorflow.org/api_docs/python/nn/classification#softmax_cross_entropy_with_logits
Hope this helps.
The data set is well balanced, 50% positive and negative.
The training set shape is (411426,X)
The training set shape is (68572,X)
X is the number of the feature coming from word2vec and I try with the values between [100,300]
I have 1 hidden layer, and the number of neurons that I test varied between [100,300]
I also test with mush smaller features/neurons size: 2-20 features and 10 neurons on the hidden layer.
I use also the cross entropy as cost fonction.

Neural Network: validation accuracy constant, training accuracy decreasing

I have a neural network which does image segmentation. I trained it ~100 epochs. The current effect is that the validation loss is constant (0.2 +/- 0.03) and the training accuracy is still decreasing (currently 0.07), but very slow.
The result of the neural network is quite well.
What does this mean? Is it overfitting? Should i stop the training?
I currently use dropout in the first layer (50%). Would it make sense to add dropout to every layer (there are about ~15 layers)? Or should i also add L2 regularization? Does it make sense to use L2 and also droput?
Thank you very much
It is recommended to use L2 when you use dropout. I think that your dropout at 50% is a little too high. People usually use it around 20% depending on the operations.
Moreover, 100 epochs may not be enough, it depends on the size of your training set and the size of your neural network.
What do you mean by "quite well"? Please quantify it and share an example. The validation and accuracy are just "indicators", their value also depend on the NN and the training set, so 0.2 can be either bad or good depending on your problem.

ANN different results for same train-test sets

I'm implementing a neural network for a supervised classification task in MATLAB.
I have a training set and a test set to evaluate the results.
The problem is that every time I train the network for the same training set I get very different results (sometimes I get a 95% classification accuracy and sometimes like 60%) for the same test set.
Now I know this is because I get different initial weights and I know that I can use 'seed' to set the same initial weights but the question is what does this say about my data and what is the right way to look at this? How do I define the accuracy I'm getting using my designed ANN? Is there a protocol for this (like running the ANN 50 times and get an average accuracy or something)?
Thanks
Make sure your test set is large enough compared to the training set (e.g. 10% of the overall data) and check it regarding diversity. If your test set only covers very specific cases, this could be a reason. Also make sure you always use the same test set. Alternatively you should google the term cross-validation.
Furthermore, observing good training set accuracy while observing bad test set accuracy is a sign for overfitting. Try to apply regularization like a simple L2 weight decay (simply multiply your weight matrices with e.g. 0.999 after each weight update). Depending on your data, Dropout or L1 regularization could also help (especially if you have a lot of redundancies in your input data). Also try to choose a smaller network topology (fewer layers and/or fewer neurons per layer).
To speed up training, you could also try alternative learning algorithms like RPROP+, RPROP- or RMSProp instead of plain backpropagation.
Looks like your ANN is not converging to the optimal set of weights. Without further details of the ANN model, I cannot pinpoint the problem, but I would try increasing the number of iterations.