How is the total R-score factor calculated in the neural network?
For example, for test data, it is between the test output and the actual output.
For validation, it is between the validation data and the actual output.
What criteria is considered for the total** R-score total **that is reported in MATLAB?
How is the total R-score factor calculated in the neural network?
For example, for test data, it is between the test output and the actual output.
For validation, it is between the validation data and the actual output.
What criteria is considered for the **total R-SCORE totall **that is reported in MATLAB?
Related
If I was to standardise the training data before I train the neural network, after the training do I then de-standardise the training data and feed it back in to the neural network to show the final modelled results and expected results. Or do I feed the standardised training data back in and de-standardise the final results and expected results after?
You never destandarize input data. Network (or any other machine learning model) won't understand data which is in different scale/space than the one that was used during training.
If you did, however scale the output values, than in an obvious way to have to scale them back in order to obtain "unscaled" results.
I trained my artificial neural network (ANN) in MATLAB with 652,500 data points, and in another blind test (652,100 data points - for completely new input data sets) the output is excellent (as I want). But the problem occurs when I insert very less amount of data (for example, below 50 data points). The output is quite unexpected, and I checked it many times.
To be more precise, the training phase contains 10% data for training, 45% for validation and 45% for testing. The training is quite successful, and for large amount of new input data it works very well. The problem is when very limited data (compared to training data points) are inserted in the neural network, it shows quite unrealistic output, beyond the range on what it was trained.
Why is this so? Could anyone light some sheds on this please?
Also mention please, is there any strict (hard and fast) rules on training and final testing data points? For example: what percent of training data should be / must be introduced in the new input data sets. I guess the problem is my network overestimate or underestimate the output as very less percentage of data it receives as compared to training phase.
Your problem is over-fitting of the dataset in duration of training. Data dividing is a very important task in training of a neural network. In general and more scientifically, the percentage of the training set should be between 70-80%. Test and validation sets should be each on around 10-15%. For instance:
net.divideParam.trainRatio = 70/100;
net.divideParam.valRatio = 15/100;
net.divideParam.testRatio = 15/100;
You imagine a student in a class. TrainRatio is materials/lectures that should be learned by student. ValRatio is the percentage of the materials that should be examined as a middle-term examination, and TestRatio is the percentage of the materials should be examined as final examination. So, if you have not enough material for training, the student cannot be a success in the middle and final examination. Is it clear? A neural network works for such a simple student for learning/training. So, your network faces with over-fitting problems.
I'm relatively new to Matlab ANN Toolbox. I am training the NN with pattern recognition and target matrix of 3x8670 containing 1s and 0s, using one hidden layer, 40 neurons and the rest with default settings. When I get the simulated output for new set of inputs, then the values are around 0 and 1. I then arrange them in descending order and choose a fixed number(which is known to me) out of 8670 observations to be 1 and rest to be zero.
Every time I run the program, the first row of the simulated output always has close to 100% accuracy and the following rows dont exhibit the same kind of accuracy.
Is there a logical explanation in general? I understand that answering this query conclusively might require the understanding of program and problem, but its made of of several functions to clearly explain. Can I make some changes in the training to get consistence output?
If you have any suggestions please share it with me.
Thanks,
Nishant
Your problem statement is not clear for me. For example, what you mean by: "I then arrange them in descending order and choose a fixed number ..."
As I understand, you did not get appropriate output from your NN as compared to the real target. I mean, your output from NN is difference than target. If so, there are different possibilities which should be considered:
How do you divide training/test/validation sets for training phase? The most division should be assigned to training (around 75%) and rest for test/validation.
How is your training data set? Can it support most scenarios as you expected? If your trained data set is not somewhat similar to your test data sets (e.g., you have some new records/samples in the test data set which had not (near) appear in the training phase, it explains as 'outlier' and NN cannot work efficiently with these types of samples, so you need clustering approach not NN classification approach), your results from NN is out-of-range and NN cannot provide ideal accuracy as you need. NN is good for those data set training, where there is no very difference between training and test data sets. Otherwise, NN is not appropriate.
Sometimes you have an appropriate training data set, but the problem is training itself. In this condition, you need other types of NN, because feed-forward NNs such as MLP cannot work with compacted and not well-separated regions of data very well. You need strong function approximation such as RBF and SVM.
Such a problem: I've trained some ann using MSE stop function up to "desired error" 10^-5 (5MB of training data, 15000 input items,long training period -- about a day). I've got 0 bit fail during training. I've saved the ann to a file.
Then I loaded the net from the file, and check it on the same training data. And sometimes I'm getting bit fail up to 5 (not so seldom, BTW!).
What is this? Does anybody meet such a phenomenon?
I suspect, this is the rounding artefact: many thousands of weights saved to the file in text format and loaded back...
Solved.
MSE after fann_reset_MSE() and fann_test_data() has no relation to the error returned by fann_train(). If the ANN is trained up to very low MSE, then fann_get_MSE() and fann_get_bit_fail() are more or less in agreement with values returned by these functions ater fann_reset_MSE() and fann_test_data(). If not (ANN is not trained well), then these values might differ in orders of magnitude.
I'm training a XOR neural network via back-propagation using stochastic gradient descent. The weights of the neural network are initialized to random values between -0.5 and 0.5. The neural network successfully trains itself around 80% of the time. However sometimes it gets "stuck" while backpropagating. By "stuck", I mean that I start seeing a decreasing rate of error correction. For example, during a successful training, the total error decreases rather quickly as the network learns, like so:
...
...
Total error for this training set: 0.0010008071327708653
Total error for this training set: 0.001000750550254843
Total error for this training set: 0.001000693973929822
Total error for this training set: 0.0010006374037948094
Total error for this training set: 0.0010005808398488103
Total error for this training set: 0.0010005242820908169
Total error for this training set: 0.0010004677305198344
Total error for this training set: 0.0010004111851348654
Total error for this training set: 0.0010003546459349181
Total error for this training set: 0.0010002981129189812
Total error for this training set: 0.0010002415860860656
Total error for this training set: 0.0010001850654351723
Total error for this training set: 0.001000128550965301
Total error for this training set: 0.0010000720426754587
Total error for this training set: 0.0010000155405646494
Total error for this training set: 9.99959044631871E-4
Testing trained XOR neural network
0 XOR 0: 0.023956746649767453
0 XOR 1: 0.9736079194769579
1 XOR 0: 0.9735670067093437
1 XOR 1: 0.045068688874314006
However when it gets stuck, the total errors are decreasing, but it seems to be at a decreasing rate:
...
...
Total error for this training set: 0.12325486644721295
Total error for this training set: 0.12325486642503929
Total error for this training set: 0.12325486640286581
Total error for this training set: 0.12325486638069229
Total error for this training set: 0.12325486635851894
Total error for this training set: 0.12325486633634561
Total error for this training set: 0.1232548663141723
Total error for this training set: 0.12325486629199914
Total error for this training set: 0.12325486626982587
Total error for this training set: 0.1232548662476525
Total error for this training set: 0.12325486622547954
Total error for this training set: 0.12325486620330656
Total error for this training set: 0.12325486618113349
Total error for this training set: 0.12325486615896045
Total error for this training set: 0.12325486613678775
Total error for this training set: 0.12325486611461482
Total error for this training set: 0.1232548660924418
Total error for this training set: 0.12325486607026936
Total error for this training set: 0.12325486604809655
Total error for this training set: 0.12325486602592373
Total error for this training set: 0.12325486600375107
Total error for this training set: 0.12325486598157878
Total error for this training set: 0.12325486595940628
Total error for this training set: 0.1232548659372337
Total error for this training set: 0.12325486591506139
Total error for this training set: 0.12325486589288918
Total error for this training set: 0.12325486587071677
Total error for this training set: 0.12325486584854453
While I was reading up on neural networks I came across a discussion on local minimas and global minimas and how neural networks don't really "know" which minima its supposed to be going towards.
Is my network getting stuck in a local minima instead of a global minima?
Yes, neural networks can get stuck in local minima, depending on the error surface. However this abstract suggests that there are no local minima in the error surface of the XOR problem. However I cannot get to the full text, so I cannot verify what the authors did to proove this and how it applies to your problem.
There also might be other factors leading to this problem. For example if you descend very fast at some steep valley, if you just use a first order gradient descent, you might get to the opposite slope and bounce back and forth all the time. You could try also giving the average change over all weights at each iteration, to test if you realy have a "stuck" network, or rather one, which just has run into a limit cycle.
You should first try fiddling with your parameters (learning rate, momentum if you implemented it etc). If you can make the problem go away, by changing parameters, your algorithm is probably ok.
Poor gradient descent with excessively large steps as described by LiKao is one possible problem. Another is that there are very flat regions of the XOR error landscape which means that it takes a very long time to converge, and in fact the gradient may be so weak that descent algorithm doesn't pull you in the right direction.
These two papers look at 2-1-1 and 2-2-1 XOR landscapes. One uses a "cross entropy" error function which I don't know. In the first they declare there are no local minima but in the second they say there are local minima at infinity - basically when weights run off to very large values. So for the second case, their results suggest if you don't start off near "enough" true minima you may get trapped at the infinite points. They also say that other analyses of 2-2-1 XOR networks that show no local minima are not contradicted by their results because of particular definitions.
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.31.4770
http://www.ncbi.nlm.nih.gov/pubmed/12662806
I encountered the same issue and found that using the activation function 1.7159*tanh(2/3*x) described in LeCun's "Efficient Backprop" paper helps. This is presumably because that function does not saturate around the target values {-1, 1}, whereas regular tanh does.
The paper by Hamey cited in #LiKao's answer proves there are no strict "regional local minima" for XOR in a 2-2-1 neural network. However, it admits "asymptotic minima" wherein the error surface flattens out as one or more weights approach infinity.
In practice, the weights don't even need to be so large for this to happen and it is quite common for a 2-2-1 net to get stuck in this flat asymptotic region. The reason for this is saturation: the gradient of sigmoid activation approaches 0 as the weights get large, so the network is unable to keep learning.
See my notebook experiment - typically around 2 or 3 out of 10 networks end up stuck, even after 10,000 epochs. Results differ slightly if you change the learning rate, batch size, activation or loss functions, initial weights, whether inputs are created randomly or in a fixed order, etc. but usually a network gets stuck now and then.