Matlab - low neural network goal affects result - matlab

I've created an OCR with matlab's Neural Networks.
I've used traingdx
net.trainParam.epochs = 8000;
net.trainParam.min_grad = 0.0000;
net.trainParam.goal = 10e-6;
I've noticed that when I use different goals I get different results (as expected of course).
The weird thing is that I found that I have to "play" with the goal value to get good results.
I expected that the lower you go, the better the results and recognition. But I found that if I lower the goal to like 10e-10 i actually get worse recognition results.
Any idea why lowering the goal would decrease the correctness of the Neural Network ?
I think it might have something to do with it trying too hard to it right, so it doesn't work as well with noise and change.

My NN knowledge is a little rusty, but yes, training the network too much will overtrain it. This will make the network work better on the training vectors you give it, but will make it worse for different inputs.
This is why you generally train it on a set of training vectors and then test the quality with a set of testing vectors. You can do the training iteratively: train on the training set to a certain goal accuracy, then check results for your testing set, increase your goal accuracy and repeat. Stop training when your result on the testing set is worse than what you previously had.

Related

How to deal with the randomness of NN training process?

Consider the training process of deep FF neural network using mini-batch gradient descent. As far as I understand, at each epoch of the training we have different random set of mini-batches. Then iterating over all mini batches and computing the gradients of the NN parameters we will get random gradients at each iteration and, therefore, random directions for the model parameters to minimize the cost function. Let's imagine we fixed the hyperparameters of the training algorithm and started the training process again and again, then we would end up with models, which completely differs from each other, because in those trainings the changes of model parameters were different.
1) Is it always the case when we use such random based training algorithms?
2) If it is so, where is the guaranty that training the NN one more time with the best hyperparameters found during the previous trainings and validations will yield us the best model again?
3) Is it possible to find such hyperparameters, which will always yield the best models?
Neural Network are solving a optimization problem, As long as it is computing a gradient in right direction but can be random, it doesn't hurt its objective to generalize over data. It can stuck in some local optima. But there are many good methods like Adam, RMSProp, momentum based etc, by which it can accomplish its objective.
Another reason, when you say mini-batch, there is at least some sample by which it can generalize over those sample, there can be fluctuation in the error rate, and but at least it can give us a local solution.
Even, at each random sampling, these mini-batch have different-2 sample, which helps in generalize well over the complete distribution.
For hyperparameter selection, you need to do tuning and validate result on unseen data, there is no straight forward method to choose these.

How to prevent converge to mean solution for regression problems in CNN?

I am training a CNN for predicting joints on hands. The problem is that my net always converges to the mean value of the training set, and I can only get identical results for different test images. Do you know how to prevent this?
I think you must be using the MSECriterion()? It is the standard l2 (minimum square error) loss. While the CNN tries to predict results, there are multiple modes through which the result can be correct. And what l2 loss does is that it converges to an average of all these modes as that is the most feasible way it can intuitively approach to attain less-penalized results.
The MSE-based solution
appears overly smooth due to the pixel-wise average of
possible solutions in the pixel space
To pick the optimum mode of answer, you can look into adversarial loss LINK. This loss picks the optimum mode based on what it thinks is realistic in terms of the data it has seen.
For further clarification, look at figure 3 in this paper: SRGAN
I was using tensorflow. Was trying to do some regression using simple CNN with one neuron in output layer. Was optimizing mean square error:
cost = tf.reduce_mean(tf.abs(y_prediction - y_output_placeholder))
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)
My problem was I made output placeholder of true values with different shape than output predictions of the net.
placeholder's shape was [None]
predictions's shape was [None, 1].
When I changed placeholder's shape to match the one of prediction output problem was solved.

Patternet not converging to a solution

I need to use a neural network for a binary classifier. I am using Matlab to classify the data, specifically with Patternet. The problem is, the neural network doesn't seem to find a solution. The performance seems to be asymptotic, it doesn't move at all! It's static across the whole training session.
I have had better results with the feedforward net, I get real values as the output and not binary, so I define a threshold (for instance above 0.5 is 1, below 0.5 is zero). Is there a better way to do it?
Why would the feedforward pattern network seems useless for this task but the regular feedforward net for fitting seems a better approach?
You can try standardizing your data. Non-standardized data tends to have the training algorithm get stuck in local maximum.
You can also increase the hidden layer neurons or even number of hidden layers and try.

Accuracy of Neural network Output-Matlab ANN Toolbox

I'm relatively new to Matlab ANN Toolbox. I am training the NN with pattern recognition and target matrix of 3x8670 containing 1s and 0s, using one hidden layer, 40 neurons and the rest with default settings. When I get the simulated output for new set of inputs, then the values are around 0 and 1. I then arrange them in descending order and choose a fixed number(which is known to me) out of 8670 observations to be 1 and rest to be zero.
Every time I run the program, the first row of the simulated output always has close to 100% accuracy and the following rows dont exhibit the same kind of accuracy.
Is there a logical explanation in general? I understand that answering this query conclusively might require the understanding of program and problem, but its made of of several functions to clearly explain. Can I make some changes in the training to get consistence output?
If you have any suggestions please share it with me.
Thanks,
Nishant
Your problem statement is not clear for me. For example, what you mean by: "I then arrange them in descending order and choose a fixed number ..."
As I understand, you did not get appropriate output from your NN as compared to the real target. I mean, your output from NN is difference than target. If so, there are different possibilities which should be considered:
How do you divide training/test/validation sets for training phase? The most division should be assigned to training (around 75%) and rest for test/validation.
How is your training data set? Can it support most scenarios as you expected? If your trained data set is not somewhat similar to your test data sets (e.g., you have some new records/samples in the test data set which had not (near) appear in the training phase, it explains as 'outlier' and NN cannot work efficiently with these types of samples, so you need clustering approach not NN classification approach), your results from NN is out-of-range and NN cannot provide ideal accuracy as you need. NN is good for those data set training, where there is no very difference between training and test data sets. Otherwise, NN is not appropriate.
Sometimes you have an appropriate training data set, but the problem is training itself. In this condition, you need other types of NN, because feed-forward NNs such as MLP cannot work with compacted and not well-separated regions of data very well. You need strong function approximation such as RBF and SVM.

ANN different results for same train-test sets

I'm implementing a neural network for a supervised classification task in MATLAB.
I have a training set and a test set to evaluate the results.
The problem is that every time I train the network for the same training set I get very different results (sometimes I get a 95% classification accuracy and sometimes like 60%) for the same test set.
Now I know this is because I get different initial weights and I know that I can use 'seed' to set the same initial weights but the question is what does this say about my data and what is the right way to look at this? How do I define the accuracy I'm getting using my designed ANN? Is there a protocol for this (like running the ANN 50 times and get an average accuracy or something)?
Thanks
Make sure your test set is large enough compared to the training set (e.g. 10% of the overall data) and check it regarding diversity. If your test set only covers very specific cases, this could be a reason. Also make sure you always use the same test set. Alternatively you should google the term cross-validation.
Furthermore, observing good training set accuracy while observing bad test set accuracy is a sign for overfitting. Try to apply regularization like a simple L2 weight decay (simply multiply your weight matrices with e.g. 0.999 after each weight update). Depending on your data, Dropout or L1 regularization could also help (especially if you have a lot of redundancies in your input data). Also try to choose a smaller network topology (fewer layers and/or fewer neurons per layer).
To speed up training, you could also try alternative learning algorithms like RPROP+, RPROP- or RMSProp instead of plain backpropagation.
Looks like your ANN is not converging to the optimal set of weights. Without further details of the ANN model, I cannot pinpoint the problem, but I would try increasing the number of iterations.