How to prevent converge to mean solution for regression problems in CNN? - neural-network

I am training a CNN for predicting joints on hands. The problem is that my net always converges to the mean value of the training set, and I can only get identical results for different test images. Do you know how to prevent this?

I think you must be using the MSECriterion()? It is the standard l2 (minimum square error) loss. While the CNN tries to predict results, there are multiple modes through which the result can be correct. And what l2 loss does is that it converges to an average of all these modes as that is the most feasible way it can intuitively approach to attain less-penalized results.
The MSE-based solution
appears overly smooth due to the pixel-wise average of
possible solutions in the pixel space
To pick the optimum mode of answer, you can look into adversarial loss LINK. This loss picks the optimum mode based on what it thinks is realistic in terms of the data it has seen.
For further clarification, look at figure 3 in this paper: SRGAN

I was using tensorflow. Was trying to do some regression using simple CNN with one neuron in output layer. Was optimizing mean square error:
cost = tf.reduce_mean(tf.abs(y_prediction - y_output_placeholder))
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)
My problem was I made output placeholder of true values with different shape than output predictions of the net.
placeholder's shape was [None]
predictions's shape was [None, 1].
When I changed placeholder's shape to match the one of prediction output problem was solved.

Related

neural network converges too fast and predicts blank results

I am using a UNet model to train a segmentation algorithm with roughly 1,000 grayscale medical images and 1,000 corresponding masks where the section of interest in the medical image is white pixel and the background is black.
I am using dice loss and a similar dice score as an accuracy metric to account for the fact that my white pixels are generally less in number than the black background pixels. But I am still having a few problems when training
1) The loss converges too fast. If I have my SGD optimizer's learning rate at 0.01 for example, at around 2 epochs the loss (training and validation) will drop to 0.00009 and the accuracy shoots up and settles at 100% in proportion. Testing on an unseen set gives blank images.
Assumption - Overfitting:
I assumed this was due to overfitting, so I augmented the dataset as much as possible with rigid transformations - flipping and rotating, but still no help.
Also if I test the model against the same data I used to train it, it still predicts blank images. So does this mean it isn't a case of overfitting?
2)Model doesn't look like it's even training. I was able to check the model before it reduced all the test data to blackness, but even then the results would look like blurry versions of the original without segmenting the features highlighted by my training mask
3) The loss vs epochs and accuracy vs epochs output charts are very smooth: They present none of the oscillating behaviour that I expect to see when doing semantic segmentation. According to this related post a smooth chart usually occurs when there is only one class. I however assumed that my model would see the training masks (white pixels vs black pixels) and see that as a two class problem. Am I wrong in this assumption?
4) According to this post Dice is good for an unbalanced training set. I have also tried to get precision/recall/F1 results as they suggest, but was unable to do it and assuming it might be related to my 3rd issue where the model sees my segmentation task as a single class problem.
TLDR: How can I fix the black output results I am getting? Can you please help me clarify if my learning model is actually seeing my white and black pixels in each mask as two separate classes and if not what is it actually doing?
Your model is only predicting one class (the background/back pixels) because of the class imbalance.
The loss converges too fast. If I have my SGD optimizer's learning rate at 0.01 for example, at around 2 epochs the loss (training and validation) will drop to 0.00009 and the accuracy shoots up and settles at 100% in proportion. Testing on an unseen set gives blank images.
Lower your learning rate. 0.01 is really high, so try something like 3e-5 for your learning and see how your model performs.
Also, having a 100% accuracy (supposedly you're using dice?) suggests that you're still using accuracy, so I believe that your model does not recognize that you're using dice/dice loss for training and evaluation(code snippets would be appreciated).
Example:
model.compile(optimizer=Adam(lr=TRAIN_SEG_LEARNING_RATE),
loss=dice_coef_loss,
metrics=[dice_coef])
Also if I test the model against the same data I used to train it, it still predicts blank images. So does this mean it isn't a case of overfitting?
Try using model.evaluate(test_data, test_label). If the evaluated performance is good (dice should be extremely low if you're only predicting 0s), then either your labels are messed in some way or there is something wrong with your pipeline.
Possible Solutions if all else fails:
make sure to go through all the sanity checks in this article
You might not have enough data, so try to use a patchwise approach with random crops.
Add more regularization (dropout, BatchNormalization, InstanceNormalization, increasing input image size, etc.)

Backpropagation neural network, too many neurons in layer causing output to be too high

Having neural network with alot of inputs causes my network problems like
Neural network gets stuck and feed forward calculation always gives output as
1.0 because of the output sum being too big and while doing backpropagation, sum of gradients will be too high what causes the
learning speed to be too dramatic.
Neural network is using tanh as an active function in all layers.
Giving alot of thought, I came up with following solutions:
Initalizing smaller random weight values ( WeightRandom / PreviousLayerNeuronCount )
or
After calculation the sum of either outputs or gradients, dividing the sum with the number of 'neurons in previus layer for output sum' and number of 'neurons in next layer for gradient sum' and then passing sum into activation/derivative function.
I don't feel comfortable with solutions I came up with.
Solution 1. does not solve problem entirely. Possibility of gradient or output sum getting to high is still there. Solution 2. seems to solve the problem but I fear that it completely changes network behavior in a way that it might not solve some problems anymore.
What would you suggest me in this situation, keeping in mind that reducing neuron count in layers is not an option?
Thanks in advance!
General things that affect the output backpropagation include weights and biases of early elections, the number of hidden units, the amount of exercise patterns, and long iterations. As an alternative way, the selection of initial weights and biases there are several algorithms that can be used, one of which is an algorithm Nguyen widrow. You can use it to initialize the weights and biases early, I've tried it and gives good results.

Why does my neural network trained on MNIST data set not predict 7 and 9 correctly?

I'm using Matlab ( github code repository ). The details of the network are:
Hidden units: 100 ( variable )
Epochs : 500
Batch size: 100
The weights are being updated using Back propagation algorithm.
I've been able to recognize 0,1,2,3,4,5,6,8 which I have drawn in photoshop.
However 7,9 are not recognized, but upon running on the test set I get only 749/10000 wrong and it correctly classifies 9251/10000.
Any idea what might be wrong? Because it is learning and based on the test set results its learning correctly.
I don't see anything downright incorrect in your code, but there is a lot that can be improved:
You use this to set the initial weights:
hiddenWeights = rand(hiddenUnits,inputVectorSize);
outputWeights = rand(outputVectorSize,hiddenUnits);
hiddenWeights = hiddenWeights./size(hiddenWeights, 2);
outputWeights = outputWeights./size(outputWeights, 2);
This will make your weights very small I think. Not only that, but you will have no negative values, so you'll throw away half of the sigmoid's range of values. I suggest you try:
weights = 2*rand(x, y) - 1
Which will generate random numbers in [-1, 1]. You can then try dividing this interval to get smaller weights (try dividing by the sqrt of the size).
You use this as the output delta:
outputDelta = dactivation(outputActualInput).*(outputVector - targetVector) % (tk-yk)*f'(yin)
Multiplying by the derivative is done if you use the square loss function. For log loss (which is usually the one used in classification), you should have just outputVector - targetVector. It might not make that big of a difference, but you might want to try.
You say in the comments that the network doesn't detect your own sevens and nines. This can suggest overfitting on the MNIST data. To address this, you'll need to add some form of regularization to your network: either weight decay or dropout.
You should try different learning rates as well, if you haven't already.
You don't seem to have any bias neurons. Each layer, except the output layer, should have a neuron that only returns the value 1 to the next layer. You can implement this by adding another feature to your input data that is always 1.
MNIST is a big data set for which better algorithms are still being researched. Your networks is very basic, small, with no regularization, no bias neurons and no improvements to classic gradient descent. It's not surprising that it's not working too well: you'll likely need a more complex network for better results.
Nothing to do with neural nets or your code,
but this picture of KNN-nearest digits shows that some MNIST digits
are simply hard to recognize:

ANN-based navigation system

I am currently working on an indoor navigation system using a Zigbee WSN in star topology.
I currently have signal strength data for 60 positions in an area of 15m by 10 approximately. I want to use ANN to help predict the coordinates for other positions. After going through a number of threads, I realized that normalizing the data would give me better results.
I tried that and re-trained my network a few times. I managed to get the goal parameter in the nntool of MATLAB to the value .000745, but still after I give a training sample as a test input, and then scaling it back, it is giving a value way-off.
A value of .000745 means that my data has been very closely fit, right? If yes, why this anomaly? I am dividing and multiplying by the maximum value to normalize and scale the value back respectively.
Can someone please explain me where I might be going wrong? Am I using the wrong training parameters? (I am using TRAINRP, 4 layers with 15 neurons in each layer and giving a goal of 1e-8, gradient of 1e-6 and 100000 epochs)
Should I consider methods other than ANN for this purpose?
Please help.
For spatial data you can always use Gaussian Process Regression. With a proper kernel you can predict pretty well and GP regression is a pretty simple thing to do (just matrix inversion and matrix vector multiplication) You don't have much data so exact GP regression can be easily done. For a nice source on GP Regression check this.
What did you scale? Inputs or outputs? Did scale input+output for your trainingset and only the output while testing?
What kind of error measure do you use? I assume your "goal parameter" is an error measure. Is it SSE (sum of squared errors) or MSE (mean squared errors)? 0.000745 seems to be very small and usually you should have almost no error on your training data.
Your ANN architecture might be too deep with too few hidden units for an initial test. Try different architectures like 40-20 hidden units, 60 HU, 30-20-10 HU, ...
You should generate a test set to verify your ANN's generalization. Otherwise overfitting might be a problem.

Matlab - low neural network goal affects result

I've created an OCR with matlab's Neural Networks.
I've used traingdx
net.trainParam.epochs = 8000;
net.trainParam.min_grad = 0.0000;
net.trainParam.goal = 10e-6;
I've noticed that when I use different goals I get different results (as expected of course).
The weird thing is that I found that I have to "play" with the goal value to get good results.
I expected that the lower you go, the better the results and recognition. But I found that if I lower the goal to like 10e-10 i actually get worse recognition results.
Any idea why lowering the goal would decrease the correctness of the Neural Network ?
I think it might have something to do with it trying too hard to it right, so it doesn't work as well with noise and change.
My NN knowledge is a little rusty, but yes, training the network too much will overtrain it. This will make the network work better on the training vectors you give it, but will make it worse for different inputs.
This is why you generally train it on a set of training vectors and then test the quality with a set of testing vectors. You can do the training iteratively: train on the training set to a certain goal accuracy, then check results for your testing set, increase your goal accuracy and repeat. Stop training when your result on the testing set is worse than what you previously had.