How does a Neural Network "remember" what its learned? - neural-network

Im trying to wrap my head around understanding neural networks and from everything I've seen, I understand that they are made up of layers created by nodes. These nodes are attached to each other with "weighted" connections, and by passing values through the input layer, the values travel through the nodes, changing their values dependent on the "weight" of the connections (right?). Eventually they reach the output layer with a value. I understand the process but I don't see how this leads to the network being trained. Does the network remember a pattern between weighted connections? How does it remember that pattern?

Each weight and bias on each node is like a stored variable. As new data causes its weights and biases to change, these variables change. Eventually a trained algorithm is done and the weights and biases don't need to change anymore. You can then store the information about the all the nodes, weights, biases and connections however you like. This information is your model. So the "remembering" is just the values of the weights and biases.

Neural network remembers what its learned through its weights and biases. Lets explain it with a binary classification example. During forward propagation, the value computed is the
probability(say p) and actual value is y. Now, loss is calculated using the formula:->
-(ylog(p) + (1-y)log(1-p)). Once the loss is calculated, this info is propagated backwards and corresponding derivatives of weights and biases are calculated using this loss. Now weights and biases are adjusted according to these derivatives. In one epoch, all the examples present are propagated and weights and biases are adjusted. Then, same examples are propagated forward and backward and correspondingly in each step, weights and biases are adjusted. Finally, after minimizing the loss to a good extent or, achieving a high accuracy (make sure not to overfit), we can store the value of weights and biases and this is what neural network has learned.

Related

Are the bias values actually ajusted or only the weights with respect to the connection channels between them and the neuron's layer?

I was reading some literature about ANN and got a bit confused with how the biases are updated. I understand that the process is done through backpropagation, however I am confused to which part of the biases are actually adjusted since I read that their value is always one.
So my question is if the biases values are adjusted because their connection channel weights are update therefore causing the adjustment or if is the actual value one that is updated.
Thanks in advance!
Bias is just another parameter that is trained by computing derivatives, as every other part of the neural network. One can simulate a bias by concatenating extra 1 to activations on the previous layer, since
w x + b = <[w b], [x 1]>
where [ ] is concatenation. Consequently it is not the bias that is 1, bias is just a trainable parameter, but one can think about a bias as if it was regular neuron-neuron connection, where the input neuron is equal to 1.

The understanding about dropout in DNN

From what I understand about DNN's dropout regularization is that:
Dropout:
First we randomly delete neurons from the DNN and leave only the input and output the same. Then we perform forward propagation and backward propagation based on a mini-batch; learn the gradient for this mini-batch and then update the weights and biases – Here I denote these updated weights and biases as Updated_Set_1.
Then, we restore the DNN to default state and randomly delete the neurons. Now we perform the forward and backward propagation and find a new set of weights and biases called Updated_Set_2. This process continues until Updated_Set_N ~ N represents the number of mini batches.
Lastly, we calculate the average of all weights and biases based on the total Updated_Set_N; example, from Updated_Set_1 ~ Updated_Set_N. These new average weights and biases will be used to predict the new input.
I would just want to confirm whether my understanding is correct or wrong. If wrong, please do share me your thoughts and teach me. thank you in advance.
Well, actually there is no averaging. During training, for every feed forward/back forward pass, we randomly "mute"/deactivate some neurons, so that their outputs and related weights are not considered during computation of the output neither during back propagation.
That means we are forcing the other activated neurons to give good prediction without the help of the deactivated neurons.
So this increase their independency to the other neurons(features) and in the same way increase the model generalization.
Other than this the forward and back propagation phase are the same without dropout.

weight update of one random layer in multilayer neural network using backpagation?

In training Multi-layer Neural networks using back-propagation, weights of all layer are updated in each iteration.
I am thinking if we randomly select any layer and update weights of that layer only in each iteration of back-propagation.
How is it going to impact training time? Does model performance (generalization capabilities of model) suffers from this type of training?
My intuition is that generalization capability will be same and training time will be reduced. Please correct if I am wrong.
Your intution is wrong. What you are proposing is a block coordinated descent and while it makes sense to do something like this if the gradients are not correlated it does not make sense to do so in this context.
The problem in NNs for this is that you get the gradient of preceeding layers for free, while you calculate the gradient for any single layer, due to the chain rule. Therefore, you are just discarding this information for no good reason.

Issues with neural network

I am having some issues with using neural network. I am using a non linear activation function for the hidden layer and a linear function for the output layer. Adding more neurons in the hidden layer should have increased the capability of the NN and made it fit to the training data more/have less error on training data.
However, I am seeing a different phenomena. Adding more neurons is decreasing the accuracy of the neural network even on the training set.
Here is the graph of the mean absolute error with increasing number of neurons. The accuracy on the training data is decreasing. What could be the cause of this?
Is it that the nntool that I am using of matlab splits the data randomly into training,test and validation set for checking generalization instead of using cross validation.
Also I could see lots of -ve output values adding neurons while my targets are supposed to be positives. Could it be another issues?
I am not able to explain the behavior of NN here. Any suggestions? Here is the link to my data consisting of the covariates and targets
https://www.dropbox.com/s/0wcj2y6x6jd2vzm/data.mat
I am unfamiliar with nntool but I would suspect that your problem is related to the selection of your initial weights. Poor initial weight selection can lead to very slow convergence or failure to converge at all.
For instance, notice that as the number of neurons in the hidden layer increases, the number of inputs to each neuron in the visible layer also increases (one for each hidden unit). Say you are using a logit in your hidden layer (always positive) and pick your initial weights from the random uniform distribution between a fixed interval. Then as the number of hidden units increases, the inputs to each neuron in the visible layer will also increase because there are more incoming connections. With a very large number of hidden units, your initial solution may become very large and result in poor convergence.
Of course, how this all behaves depends on your activation functions and the distributio of the data and how it is normalized. I would recommend looking at Efficient Backprop by Yann LeCun for some excellent advice on normalizing your data and selecting initial weights and activation functions.

neural networks and back propagation, justification for removeconstantrows in MATLAB

I was wondering, MATLAB has a removeconstantrows function that should be applied to feedforward neural network input and target output data. This function removes constant rows from the data. For example if one input vector for a 5-input neural network is [1 1 1 1 1] then it is removed.
Googling, the best explanation I could find is that (paraphrasing) "constant rows are not needed and can be replaced by appropriate adjustments to the biases of the output layer".
Can someone elaborate?
Who does this adjustment?
From my book, the weight adjustment for simple gradient descent is:
Δ weight_i = learning_rate * local_gradient * input_i
Which means that all weights of a neuron at the first hidden layer are adjusted the same amount. But they ARE adjusted.
I think there is a misundertanding. The "row" is not an input pattern, but a feature, that is i-th component in all patterns. It's obvious that if some feature does not have big variance on all data set, it does not provide valuable information and does not play a noticable role for network training.
The comparison to a bias is feasible (though I don't agree, that this applies to output layer (only), bacause it depends on where the constant row is found - if it's in input data, then it is right as well for the first hidden layer, imho). If you remeber, it's recommended for each neuron in backpropagation network to have a special bias weight, connected to 1 constant signal. If, for example, a training set contains a row with all 1-th, then this is the same as additional bias. If the constant row has a different value, then the bias will have different effect, but in any case you can simply eliminate this row, and add the constant value of the row into the existing bias.
Disclaimer: I'm not a Matlab user. My background in neural networks comes solely from programming area.