How loss in RNN/LSTM is calculated? - neural-network

I'm learing how LSTM works by practicing with time series training data(input is a list of features and output is a scalar).
There is a problem that i couldnt understand when calculating loss for RNN/LSTM:
How loss is calculated? Is it calculated at each time i give the nn new input or acummulated through all the given inputs and then be backprop

#seed Answer is correct. However, in LSTM, or any RNN architecture, the loss for each instance, across all time steps, is added up. In other words, you'll have (L0#t0, L1#t1, ... LT#tT) for each sample in your input batch. Add those losses separately for each instance in the batch. Finally average the losses of each input instance to get the average loss for a current batch
For more information please visit: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks

The answer does not depend on the neural network model.
It depends on your choice of optimization method.
If you are using batch gradient descent, the loss is averaged over the whole training set. This is often impractical for neural networks, because the training set is too big to fit into RAM, and each optimization step takes a lot of time.
In stochastic gradient descent, the loss is calculated for each new input. The problem with this method is that it is noisy.
In mini-batch gradient descent, the loss is averaged over each new minibatch - a subsample of inputs of some small fixed size. Some variation of this method is typically used in practice.
So, the answer to your question depends on the minibatch size you choose.
(Image is from here)

Related

How to deal with the randomness of NN training process?

Consider the training process of deep FF neural network using mini-batch gradient descent. As far as I understand, at each epoch of the training we have different random set of mini-batches. Then iterating over all mini batches and computing the gradients of the NN parameters we will get random gradients at each iteration and, therefore, random directions for the model parameters to minimize the cost function. Let's imagine we fixed the hyperparameters of the training algorithm and started the training process again and again, then we would end up with models, which completely differs from each other, because in those trainings the changes of model parameters were different.
1) Is it always the case when we use such random based training algorithms?
2) If it is so, where is the guaranty that training the NN one more time with the best hyperparameters found during the previous trainings and validations will yield us the best model again?
3) Is it possible to find such hyperparameters, which will always yield the best models?
Neural Network are solving a optimization problem, As long as it is computing a gradient in right direction but can be random, it doesn't hurt its objective to generalize over data. It can stuck in some local optima. But there are many good methods like Adam, RMSProp, momentum based etc, by which it can accomplish its objective.
Another reason, when you say mini-batch, there is at least some sample by which it can generalize over those sample, there can be fluctuation in the error rate, and but at least it can give us a local solution.
Even, at each random sampling, these mini-batch have different-2 sample, which helps in generalize well over the complete distribution.
For hyperparameter selection, you need to do tuning and validate result on unseen data, there is no straight forward method to choose these.

Neural network parameter selection

I am looking at (two-layer) feed-forward Neural Networks in Matlab. I am investigating parameters that can minimise the classification error.
A google search reveals that these are some of them:
Number of neurons in the hidden layer
Learning Rate
Momentum
Training type
Epoch
Minimum Error
Any other suggestions?
I've varied the number of hidden neurons in Matlab, varying it from 1 to 10. I found that the classification error is close to 0% with 1 hidden neuron and then grows very slightly as the number of neurons increases. My question is: shouldn't a larger number of hidden neurons guarantee an equal or better answer, i.e. why might the classification error go up with more hidden neurons?
Also, how might I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
Many thanks
Since you are considering a simple two layer feed forward network and have already pointed out 6 different things you need to consider to reduce classification errors, I just want to add one thing only and that is amount of training data. If you train a neural network with more data, it will work better. Note that, training with large amount of data is a key to get good outcome from neural networks, specially from deep neural networks.
Why the classification error goes up with more hidden neurons?
Answer is simple. Your model has over-fitted the training data and thus resulting in poor performance. Note that, if you increase the number of neurons in hidden layers, it would decrease training errors but increase testing errors.
In the following figure, see what happens with increased hidden layer size!
How may I vary the Learning Rate, Momentum, Training type, Epoch and Minimum Error in Matlab?
I am expecting you have already seen feed forward neural net in Matlab. You just need to manipulate the second parameter of the function feedforwardnet(hiddenSizes,trainFcn) which is trainFcn - a training function.
For example, if you want to use gradient descent with momentum and adaptive learning rate backpropagation, then use traingdx as the training function. You can also use traingda if you want to use gradient descent with adaptive learning rate backpropagation.
You can change all the required parameters of the function as you want. For example, if you want to use traingda, then you just need to follow the following two steps.
Set net.trainFcn to traingda. This sets net.trainParam to traingda's default parameters.
Set net.trainParam properties to desired values.
Example
net = feedforwardnet(3,'traingda');
net.trainParam.lr = 0.05; % setting the learning rate to 5%
net.trainParam.epochs = 2000 % setting number of epochs
Please see this - gradient descent with adaptive learning rate backpropagation and gradient descent with momentum and adaptive learning rate backpropagation.

How to improve digit recognition prediction in Neural Networks in Matlab?

I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.

Accuracy of Neural network Output-Matlab ANN Toolbox

I'm relatively new to Matlab ANN Toolbox. I am training the NN with pattern recognition and target matrix of 3x8670 containing 1s and 0s, using one hidden layer, 40 neurons and the rest with default settings. When I get the simulated output for new set of inputs, then the values are around 0 and 1. I then arrange them in descending order and choose a fixed number(which is known to me) out of 8670 observations to be 1 and rest to be zero.
Every time I run the program, the first row of the simulated output always has close to 100% accuracy and the following rows dont exhibit the same kind of accuracy.
Is there a logical explanation in general? I understand that answering this query conclusively might require the understanding of program and problem, but its made of of several functions to clearly explain. Can I make some changes in the training to get consistence output?
If you have any suggestions please share it with me.
Thanks,
Nishant
Your problem statement is not clear for me. For example, what you mean by: "I then arrange them in descending order and choose a fixed number ..."
As I understand, you did not get appropriate output from your NN as compared to the real target. I mean, your output from NN is difference than target. If so, there are different possibilities which should be considered:
How do you divide training/test/validation sets for training phase? The most division should be assigned to training (around 75%) and rest for test/validation.
How is your training data set? Can it support most scenarios as you expected? If your trained data set is not somewhat similar to your test data sets (e.g., you have some new records/samples in the test data set which had not (near) appear in the training phase, it explains as 'outlier' and NN cannot work efficiently with these types of samples, so you need clustering approach not NN classification approach), your results from NN is out-of-range and NN cannot provide ideal accuracy as you need. NN is good for those data set training, where there is no very difference between training and test data sets. Otherwise, NN is not appropriate.
Sometimes you have an appropriate training data set, but the problem is training itself. In this condition, you need other types of NN, because feed-forward NNs such as MLP cannot work with compacted and not well-separated regions of data very well. You need strong function approximation such as RBF and SVM.

ANN different results for same train-test sets

I'm implementing a neural network for a supervised classification task in MATLAB.
I have a training set and a test set to evaluate the results.
The problem is that every time I train the network for the same training set I get very different results (sometimes I get a 95% classification accuracy and sometimes like 60%) for the same test set.
Now I know this is because I get different initial weights and I know that I can use 'seed' to set the same initial weights but the question is what does this say about my data and what is the right way to look at this? How do I define the accuracy I'm getting using my designed ANN? Is there a protocol for this (like running the ANN 50 times and get an average accuracy or something)?
Thanks
Make sure your test set is large enough compared to the training set (e.g. 10% of the overall data) and check it regarding diversity. If your test set only covers very specific cases, this could be a reason. Also make sure you always use the same test set. Alternatively you should google the term cross-validation.
Furthermore, observing good training set accuracy while observing bad test set accuracy is a sign for overfitting. Try to apply regularization like a simple L2 weight decay (simply multiply your weight matrices with e.g. 0.999 after each weight update). Depending on your data, Dropout or L1 regularization could also help (especially if you have a lot of redundancies in your input data). Also try to choose a smaller network topology (fewer layers and/or fewer neurons per layer).
To speed up training, you could also try alternative learning algorithms like RPROP+, RPROP- or RMSProp instead of plain backpropagation.
Looks like your ANN is not converging to the optimal set of weights. Without further details of the ANN model, I cannot pinpoint the problem, but I would try increasing the number of iterations.