Computational time forward-propagation vs. back-propagation in neural network? - neural-network

What are the computational time differences of carrying out the dot products etc. in forward- propagation vs. the derivatives etc. in back-propagation for neural networks? Also, is the weights update procedure considered part of the backward pass computational time? Does the ratio forward pass/backward pass comp. time change depending on architecture e.g. CNN, RNN, LSTM? Thank you.

Related

weight update of one random layer in multilayer neural network using backpagation?

In training Multi-layer Neural networks using back-propagation, weights of all layer are updated in each iteration.
I am thinking if we randomly select any layer and update weights of that layer only in each iteration of back-propagation.
How is it going to impact training time? Does model performance (generalization capabilities of model) suffers from this type of training?
My intuition is that generalization capability will be same and training time will be reduced. Please correct if I am wrong.
Your intution is wrong. What you are proposing is a block coordinated descent and while it makes sense to do something like this if the gradients are not correlated it does not make sense to do so in this context.
The problem in NNs for this is that you get the gradient of preceeding layers for free, while you calculate the gradient for any single layer, due to the chain rule. Therefore, you are just discarding this information for no good reason.

Is neural network suitable for supervised learning where the data (inputs and outputs) are continuous?

I am working on a regression model with a set of 158 inputs and 4 outputs of glass manufacturing project which is a continuous process of inputs and outputs. Is the usage of Neural Net a suitable solution for such kind of regression models? If yes, I have understood that Recurrent Neural Nets can be used for time series data, which Recurrent Neural Net shall I use? If usage of NN is not suitable, what are the other types of solutions available other than Linear Regression and Regression Trees?
Neural Networks are indeed suitable for continuous data. In fact, it is continous by default I would say. It is possible to have discrete I/O for sure, it all depend on your functions.
Secondly, it is true that RNN are suitable for time series, in a way. RNN are in fact suitable for timesteps more than timestamps. RNN are working by iterations. Typically, each iteration can be seen as a fixed step forward in time. This said, if you data is more like (date, value) (what I call timestamp), it may not be so good. It would not be absolutely impossible, but that's not the idea.
Hope it helps, start with simple RNN, try to understand how it works, then, if you need more, read about more complex cells.

General rules for training the RNN model when loss stops decreasing

I have an RNN model. After about 10K iterations, the loss stops decreasing, but the loss is not very small yet. Does it always mean the optimization is trapped in a local minimum?
In general, what would be the actions should I take to address this issue? Add more training data? Change a different optimization scheme (SGD now)? Or Other options?
Many thanks!
JC
If you are training you neural network using a gradient vector based algorithm such as Back Propagation or Resilient Propagation it can stop improving when it finds a local minimum and it is normal because of the nature of this type fo algorithm. In this case, the propagation algorithms is used to search what a (gradient) vector is pointing.
As a suggestion you could add a different strategy during the training to explore the space of search instead only searching. For sample, a Genetic Algorithm or the Simulated Annealing algorithm. These approaches will provide a exploration of possibilities and it can find a global minimum. You could implement 10 itegrations for each 200 iterations of the propagation algorithm, creating a hybrid strategy. For sample (it's just a pseudo-code):
int epochs = 0;
do
{
train();
if (epochs % 200 == 0)
traingExplorativeApproach();
epochs++;
} while (epochs < 10000);
I've developed a strategy like this using Multi-Layer Perceptrons and Elman recurrent neural network in classification and regressions problems and both cases a hybrid strategy has provided better results then a single propagation training.

What is cost function in neural network?

Could someone please explain to me why it is so important the cost function in a neural network, what is its purpose?
Note: I'm just introducing me to the subject of neural networks, but failed to understand it perfectly.
In artificial neural networks, the cost function to return a number
representing how well the neural network performed to map training
examples to correct output.
See here and here
In other words, after you train a neural network, you have a math model that was trained to adjust its weights to get a better result. The weights and the activation function of each neuron results in a main function, which is the neural network. It is a cost function and its propose is to be adjusted (training step) to produce better results.
Cost function returns a scalar value called 'cost' , that tells how good or bad your model is. There are several cost functions that can be used. Less cost represent a good model. The reason cost functions are used in neural networks is that 'cost is used by models to improve'

Parameter settings for neural networks based classification using Matlab

Recently, I am trying to using Matlab build-in neural networks toolbox to accomplish my classification problem. However, I have some questions about the parameter settings.
a. The number of neurons in the hidden layer:
The example on this page Matlab neural networks classification example shows a two-layer (i.e. one-hidden-layer and one-output-layer) feed forward neural networks. In this example, it uses 10 neurons in the hidden layer
net = patternnet(10);
My first question is how to define the best number of neurons for my classification problem? Should I use cross-validation method to get the best performed number of neurons using a training data set?
b. Is there a method to choose three-layer or more multi-layer neural networks?
c. There are many different training method we can use in the neural networks toolbox. A list can be found at Training methods list. The page mentioned that the fastest training function is generally 'trainlm'; however, generally speaking, which one will perform best? Or it totally depends on the data set I am using?
d. In each training method, there is a parameter called 'epochs', which is the training iteration for my understanding. For each training method, Matlab defined the maximum number of epochs to train. However, from the example, it seems like 'epochs' is another parameter we can tune. Am I right? Or we just set the maximum number of epochs or leave it as default?
Any experience with Matlab neural networks toolbox is welcome and thanks very much for your reply. A.
a. You can refer to How to choose number of hidden layers and nodes in neural network? and ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hu
Surely you can do cross-validation to determine the parameter of best number of neurons. But it's not recommended as it's more suitable to use it in the stage of weights training of a certain network.
b. Refer to ftp://ftp.sas.com/pub/neural/FAQ3.html#A_hl
And for more layers of neural network, you can refer to Deep Learning, which is very hot in recent years and gets state-of-the-art performances in many of the pattern recognition tasks.
c. It depends on your data. trainlm performs better on function fitting (nonlinear regression) problems than on pattern recognition problems while training large networks and pattern recognition networks, trainscg and trainrp are good choices. Generally, Gradient Descent and Resilient Backpropagation is recommended. More detailed comparison can be found here: http://www.mathworks.cn/cn/help/nnet/ug/choose-a-multilayer-neural-network-training-function.html
d. Yes, you're right. We can tune the epochs parameter. Generally you can output the recognition results/accuracy at every epoch and you will see that it is promoting more and more slowly, and the more epochs the more computing time. You can make a compromise between the accuracy and computation time.
For part b of your question:
You can use like this code:
net = patternnet([10 15 20]);
This script create a network with 3 hidden layer that first layer has 10 neurons, second layer has 15 neurons and 3th layer has 20 neurons.