In a Neural Network, should bias have a momentum term? - neural-network

Should the momentum be added also to the bias term of every node in the network or preferably only on weights?

Bias and weights. If you just applied it to the weights, the bias would lag the weights, artificially increasing the error and slowing convergence.
Think of bias as simply one more weight -- an extra input that's always 1.

Related

Are the bias values actually ajusted or only the weights with respect to the connection channels between them and the neuron's layer?

I was reading some literature about ANN and got a bit confused with how the biases are updated. I understand that the process is done through backpropagation, however I am confused to which part of the biases are actually adjusted since I read that their value is always one.
So my question is if the biases values are adjusted because their connection channel weights are update therefore causing the adjustment or if is the actual value one that is updated.
Thanks in advance!
Bias is just another parameter that is trained by computing derivatives, as every other part of the neural network. One can simulate a bias by concatenating extra 1 to activations on the previous layer, since
w x + b = <[w b], [x 1]>
where [ ] is concatenation. Consequently it is not the bias that is 1, bias is just a trainable parameter, but one can think about a bias as if it was regular neuron-neuron connection, where the input neuron is equal to 1.

Can dropout increases training data performance?

I am training a neural network with dropout. It happens that as I decrease dropout from 0.9 to 0.7, the loss (cross-validation error) also decreases for the training data data. I noticed also that accuracy increases as I reduce dropout parameter.
It seems odd to me. Does it make sense?
Dropout is a regularization technique. You should use it only to reduce variance (validation performance vs training performance).It is not intended to reduce the bias, and you should not use it in this way. it is very misleading.
Probably the reason for which you see this behavior is that you use a very high value for dropout. 0.9 means you neutralize too many neurons. It makes sense that once you put there 0.7 instead, the network has higher neurons to use while learning on training set. So the performance will increase for lower values.
You usually should see the training performance dropping a bit, while increasing the performance on the validation set (if you do not have one, at least on the test set). This is the desired behavior you are looking for, when using dropout. The current behavior you get is because if the very high values for dropout.
Start with 0.2 or 0.3 and compare the bias vs. variance in order to get a good value for dropout.
My clear recommendation: don't use it to improve bias, but to reduce variance (error on validation set).
In order to fit better on the training set I recommend :
find a better architecture (or change the number of neurons per
layer)
try different optimizers
hyperparameter tunning
maybe train the network a bit longer
Hopefully this helps !
Dropout works by probabilistically removing, or “dropping out,” inputs to a layer, which may be input variables in the data sample or activations from a previous layer. It has the effect of simulating a large number of networks with a very different network structure and, in turn, making nodes in the network generally more robust to the inputs.
With dropout (dropout rate less than some small value), the accuracy will gradually increase and loss will gradually decrease first(That is what is happening in your case).
When you increase dropout beyond a certain threshold, it results in the model not being able to fit properly. Intuitively, a higher dropout rate would result in a higher variance to some of the layers, which also degrades training.
What you should always remember is that Dropout is like all other forms of regularization in that it reduces model capacity. If you reduce the capacity too much, it is sure that you will get bad results.
Hope this may help you.

Why do we need biases in the neural network?

We have weights and optimizer in the neural network.
Why cant we just W * input then apply activation, estimate loss and minimize it?
Why do we need to do W * input + b?
Thanks for your answer!
There are two ways to think about why biases are useful in neural nets. The first is conceptual, and the second is mathematical.
Neural nets are loosely inspired by biological neurons. The basic idea is that human neurons take a bunch of inputs and "add" them together. If the sum of the inputs is greater than some threshold, then the neuron will "fire" (produce an output that goes to other neurons). This threshold is essentially the same thing as a bias. So, in this way, the bias in artificial neural nets helps to replicate the behavior of real, human neurons.
Another way to think about biases is simply by considering any linear function, y = mx + b. Let's say you are using y to approximate some linear function z. If z has a non-zero z-intercept, and you have no bias in the equation for y (i.e. y = mx), then y can never perfectly fit z. Similarly, if the neurons in your network have no bias terms, then it can be harder for your network to approximate some functions.
All that said, you don't "need" biases in neural nets--and, indeed, recent developments (like batch normalization) have made biases less frequent in convolutional neural nets.

Neural Network: validation accuracy constant, training accuracy decreasing

I have a neural network which does image segmentation. I trained it ~100 epochs. The current effect is that the validation loss is constant (0.2 +/- 0.03) and the training accuracy is still decreasing (currently 0.07), but very slow.
The result of the neural network is quite well.
What does this mean? Is it overfitting? Should i stop the training?
I currently use dropout in the first layer (50%). Would it make sense to add dropout to every layer (there are about ~15 layers)? Or should i also add L2 regularization? Does it make sense to use L2 and also droput?
Thank you very much
It is recommended to use L2 when you use dropout. I think that your dropout at 50% is a little too high. People usually use it around 20% depending on the operations.
Moreover, 100 epochs may not be enough, it depends on the size of your training set and the size of your neural network.
What do you mean by "quite well"? Please quantify it and share an example. The validation and accuracy are just "indicators", their value also depend on the NN and the training set, so 0.2 can be either bad or good depending on your problem.

Why is dropout preventing convergence in Convolutional Neural Network?

I am using tensorflow to train a convnet with a set of 15000 training images with 22 classes. I have 2 conv layers and one fully connected layer. I have trained the network with the 15000 images and have experienced convergence and high accuracy on the training set.
However, my test set is experiencing much lower accuracy so I am assuming the network is over fitting. To combat this I added dropout before the fully connected layer of my network.
However, adding dropout has caused the network to never converge after many iterations. I was wondering why this may be. I have even used a high dropout probability (keep probability of .9) and have experienced the same results.
Well by making your keep dropout probability 0.9 it means there's 10% chance of that neuron connection getting off in each iteration .So for dropout also there should be an optimum value.
As in the above you can understand with the dropout we are also scaling our neurons. The above case is 0.5 drop out . If it's o.9 then again there will a different scaling .
So basically if it's 0.9 dropout keep probability we need to scale it by 0.9. Which means we are getting 0.1 larger something in the testing .
Just by this you can get an idea how dropout can affect . So by some probabilities it can saturate your nodes etc which causes the non converging issue..
You can add dropout to your dense layers after convolutional layers and remove dropout from convolutional layers. If you want to have many more examples, you can put some white noise (5% random pixels) on each picture and have P, P' variant for each picture. This can improve your results.
You shouldn't put 0.9 for dropout with doing this you are losing feature in your training phase. As far as I've seen most of the dropouts have had a value between 0.2 or 0.5. However, using too much dropout could cause some problems in the training phase and a longer time to converge or even in some rare cases cause the network to learn something wrong.
you need to be careful with using of dropout as you can see the image below dropout prevents features from getting to the next layer to using too many dropout or a very high dropout value could kill the learning
DropoutImage