Matlab neural network training - matlab

What is the difference in performing the following codes? It is better to modify epochs in the training structure or put the training function in a loop?
Thank you
First code:
for(i=1:10)
% Train the Network
[net,tr] = train(net,inputs,targets);
end
Second code:
net.trainParam.epochs = 200;
[net,tr] = train(net,inputs,targets);

If the inputs and targets you provide for training are describing a model that is very hard to train, then there is theoretically no difference between the first and the second code. This is assuming that your network is being trained and hitting the maximum number of iterations / epochs for each iteration in the for loop.
Assuming that this is the case, what would basically happen is that in your first piece of code, it would simply take the trained network at the previous iteration, and use that for the next iteration. This is assuming that training did not converge, and it should simply pick up "where it left off" in terms of training. In the second piece of code, you are setting the total number of iterations required for convergence in the beginning and letting the training happen only once.
If the situation arises such that it is very hard to train your network and we reach the maximum number of iterations / epochs per iteration in your for loop, then there will be no difference.
However, depending on your inputs and targets, training your neural network may take less than the maximum number of epochs you set. For example, should you set your maximum number of epochs to ... say... 100, and it took only 35 epochs to train at the first iteration of your loop, the next iterations after that will not change the network at all, and so there will be unnecessary computation as a result.
As such, if your network is very easy to train, then just use the second piece of code. If your network is very difficult to train, then simply setting the number of maximum epochs with the second piece of code and training it all in one go may not be enough. As such, if you have a harder network to train, instead of setting a huge number of epochs which may take time for convergence, it may be wise to reduce the number of total epochs and place this into a for loop instead for incremental changes.
As such, if you want to take something away from this, use the second piece of code if you can see that the network is fairly simple to train. Use the first piece of code with a reduced number of epochs but place it into a for loop for harder networks to train.

Related

How loss in RNN/LSTM is calculated?

I'm learing how LSTM works by practicing with time series training data(input is a list of features and output is a scalar).
There is a problem that i couldnt understand when calculating loss for RNN/LSTM:
How loss is calculated? Is it calculated at each time i give the nn new input or acummulated through all the given inputs and then be backprop
#seed Answer is correct. However, in LSTM, or any RNN architecture, the loss for each instance, across all time steps, is added up. In other words, you'll have (L0#t0, L1#t1, ... LT#tT) for each sample in your input batch. Add those losses separately for each instance in the batch. Finally average the losses of each input instance to get the average loss for a current batch
For more information please visit: https://stanford.edu/~shervine/teaching/cs-230/cheatsheet-recurrent-neural-networks
The answer does not depend on the neural network model.
It depends on your choice of optimization method.
If you are using batch gradient descent, the loss is averaged over the whole training set. This is often impractical for neural networks, because the training set is too big to fit into RAM, and each optimization step takes a lot of time.
In stochastic gradient descent, the loss is calculated for each new input. The problem with this method is that it is noisy.
In mini-batch gradient descent, the loss is averaged over each new minibatch - a subsample of inputs of some small fixed size. Some variation of this method is typically used in practice.
So, the answer to your question depends on the minibatch size you choose.
(Image is from here)

When training a neural net what could cause the net to diverge from the target as opposed to converge?

I'm using a stochastic (incremental as opposed to batch) approach to training my neural net, and after every 1000000 iterations I print the sum of the errors of the neurons in the net. For a little while I can see this overall error decreasing steadily, and as progress begins to slow, it then seems to reverse the complete other way and the overall error begins increasing steadily. This cannot be normal behavior, and I'm not sure what could be causing it. My learning rate is set very low 0.0001.

Increasing the number of epochs to reach the performance goal while training neural network

I am training the neural network with input vector of 85*650 and target vector of 26*650. Here is the list of parameters that I have used
net.trainParam.max_fail = 6;
net.trainParam.min_grad=1e-5;
net.trainParam.show=10;
net.trainParam.lr=0.9;
net.trainParam.epochs=13500;
net.trainParam.goal=0.001;
Number of hidden nodes=76
As you can see ,I have set the number of epochs to 13500. Is it OK to set the number of epochs to such a large number?. Performance goal is not reaching if the number of epochs is decreased and I am getting a bad classification while testing.
Try not to focus on the number of epochs. Instead, you should have, at least, two sets of data: one for training and another for testing. Use the testing set to get a feel for how well your ANN is performing and how many epochs is needed to get a decent ANN.
For example, you want to stop training when performance on your testing set as levelled-off or has begun to decrease (get worse). This would be evidence of over-learning which is the reason why more epochs is not always better.

Conceptual issues on training neural network wih particle swarm optimization

I have a 4 Input and 3 Output Neural network trained by particle swarm optimization (PSO) with Mean square error (MSE) as the fitness function using the IRIS Database provided by MATLAB. The fitness function is evaluated 50 times. The experiment is to classify features. I have a few doubts
(1) Does the PSO iterations/generations = number of times the fitness function is evaluated?
(2) In many papers I have seen the training curve of MSE vs generations being plot. In the picture, the graph (a) on left side is a model similar to NN. It is a 4 input-0 hidden layer-3 output cognitive map. And graph (b) is a NN trained by the same PSO. The purpose of this paper was to show the effectiveness of the new model in (a) over NN.
But they mention that the experiment is conducted say Cycles = 100 times with Generations =300. In that case, the Training curve for (a) and (b) should have been MSE vs Cycles and not MSE vs PSO generations ? For ex, Cycle1 : PSO iteration 1-50 --> Result(Weights_1,Bias_1, MSE_1, Classification Rate_1). Cycle2: PSO iteration 1- 50 -->Result(Weights_2,Bias_2, MSE_2, Classification Rate_2) and so on for 100 Cycles. How come the X axis in (a),(b) is different and what do they mean?
(3) Lastly, for every independent run of the program (Running the m file several times independently, through the console) , I never get the same classification rate (CR) or the same set of weights. Concretely, when I first run the program I get W (Weights) values and CR =100%. When I again run the Matlab code program, I may get CR = 50% and another set of weights!! As shown below for an example,
%Run1 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
0
Classification rate = 25
FinalWeightsBias =
-0.1156 0.2487 2.2868 0.4460 0.3013 2.5761
%Run2 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
1
Classification rate = 100
%Run3 (PSO generaions 1-50)
>>PSO_NN.m
Correlation =
-0.1260
Classification rate = 37.5
FinalWeightsBias =
-0.1726 0.3468 0.6298 -0.0373 0.2954 -0.3254
What should be the correct method? So, which weight set should I finally take and how do I say that the network has been trained? I am aware that evolutionary algorithms due to their randomness will never give the same answer, but then how do I ensure that the network has been trained?
Shall be obliged for clarification.
As in most machine learning methods, the number of iterations in PSO is the number of times the solution is updated. In the case of PSO, this is the number of update rounds over all particles. The cost function here is evaluated after every particle is updated, so more than the number of iterations. Approximately, (# cost function calls) = (# iterations) * (# particles).
The graphs here are comparing different classifiers, fuzzy cognitive maps for graph (a) and a neural network for graph (b). So the X-axis displays the relevant measures of learning iterations for each.
Every time you run your NN you initialize it with different random values, so the results are never the same. The fact that the results vary greatly from one run to the next means you have a convergence problem. The first thing to do in this case is to try running more iterations. In general convergence is a rather complicated issue and the solutions vary greatly with applications (and read carefully through the answer and comments Isaac gave you on your other question). If your problem persists after increasing the number of iterations, you can post it as a new question, providing a sample of your data and the actual code you use to construct and train the network.

Maximum number of iterations to train a neural network

I have a set of size N. How can I determine whether this data set is trained or not?
Training will take place infinitely if the data I feed is random. So I should have a maximum number iterations for which a neural network can be considered as trained normally, to avoid having an infinite number of iterations.
What is the maximum number of iteration for which I can consider the Neural Network as trained?
You will need to define a confidence interval, which you are ready to accept. Please read the article: http://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=00478409 for further information.