I have been trying to design Neural Network but have come into some issues. I am rather newbie. If that's OK, I would like to have some comments on results, I've got from training. This is simple feed forward NN with two hidden layers, two outputs and 13 inputs. The both hidden layers contain 7 neurons.
I have provided a few graphs to show my results from training a classifier. The graphs look slightly different from what I'd expect from a successful training process.
The first graphs is how the training progressed. It seems all right to me but NN stops due to validation error not minimum gradient. I don't know whether it is good news or not. On the second graph, the gradient seems to be oscillating up and down. Does it mean, NN struggled with finding an optimum structure? The ROC graph shows IMO very good results in terms of True Positive ratio and False Positive ratio. I believe higher this value, the better results. But confusion matrix is what I am worrying about the most. It states zero positive detections and all of false positives.
What do you think?
My opinion about your question by part.
if to speak about
The first graphs is how the training progressed. It seems all right to
me but NN stops due to validation error not minimum gradient. I don't
know whether it is good news or not. On the second graph, the
gradient seems to be oscillating up and down. Does it mean, NN
struggled with finding an optimum structure?
Not necessary. It could mean, that NN found appropriate weights, and then danced around it with jumping out of global minima and then coming back to it closer or further. I can propose you to have some kind of saver of some middle stages during training in order to check later which results for ROC and confusion matrix you can get from middle stages. Some time in my life I had better results, some time worse.
The ROC graph shows IMO very good results in terms of True Positive
ratio and False Positive ratio. I believe higher this value, the better results.
Hard to disagree with you on this point.
If to speak about
But confusion matrix is what I am worrying about the most. It states
zero positive detections and all of false positives.
That IMHO can be considered also as good results, or even very good results because your classification is quite fine in percentage ( for majority of my cases with which I've work it was desirable result to achieve), but also confusing.
Related
I have a genetic algorithm evolving a population of neural networks
Until now I make mutation on weights or biases using random.randn (Python) which is a random value from a normal distribution with mean = 0
It works "well" and I managed to achieve my project using it be wouldn't it be better to use a uniform distribution on a given interval ?
My intuition is that it would lead to more variety in my networks
I think, that this question has no simple solution. In case of normal distribution will be numbers around mean have more chances to be "selected" by your number generator, uniform distribution give almost equal chance to all numbers. That is clear but answer to question, will equal chance mean better result, lays according to me only at empirical experiments. So I suggest you to perform experiments with normal and uniform distribution a try to judge based on results.
About variety. I assume that you create some random vector which represents weights. At stage of mutation you perform addition of random number. This number will be more likely from close interval around mean, so in case 0 mutation with high probability will be change of some elements only little. So there will be only little improvements over vector and sometimes something big shows up. In case of uniform distribution will be changes more random, which leads to different individual. Question is, will be these individual better? I don't know, but I offer you another view. I look to genetic algorithms like an analogy to evolution theory. And from this point of view, cumulative little improvements of individual with little probability of some big change is more appropriate. Think about situation, used is uniform distribution, but children has low fitness due to big changes so at phase of creating new generation will be not selected. And you will wait so long for one tiny improvement which make your network works with good results.
Maybe one more thing. Your experiments maybe show that uniform/normal distribution is better. But such result may be true only for your current problem, no at general.
I am training a CNN for predicting joints on hands. The problem is that my net always converges to the mean value of the training set, and I can only get identical results for different test images. Do you know how to prevent this?
I think you must be using the MSECriterion()? It is the standard l2 (minimum square error) loss. While the CNN tries to predict results, there are multiple modes through which the result can be correct. And what l2 loss does is that it converges to an average of all these modes as that is the most feasible way it can intuitively approach to attain less-penalized results.
The MSE-based solution
appears overly smooth due to the pixel-wise average of
possible solutions in the pixel space
To pick the optimum mode of answer, you can look into adversarial loss LINK. This loss picks the optimum mode based on what it thinks is realistic in terms of the data it has seen.
For further clarification, look at figure 3 in this paper: SRGAN
I was using tensorflow. Was trying to do some regression using simple CNN with one neuron in output layer. Was optimizing mean square error:
cost = tf.reduce_mean(tf.abs(y_prediction - y_output_placeholder))
optimizer = tf.train.AdamOptimizer(learning_rate=LEARNING_RATE).minimize(cost)
My problem was I made output placeholder of true values with different shape than output predictions of the net.
placeholder's shape was [None]
predictions's shape was [None, 1].
When I changed placeholder's shape to match the one of prediction output problem was solved.
Having neural network with alot of inputs causes my network problems like
Neural network gets stuck and feed forward calculation always gives output as
1.0 because of the output sum being too big and while doing backpropagation, sum of gradients will be too high what causes the
learning speed to be too dramatic.
Neural network is using tanh as an active function in all layers.
Giving alot of thought, I came up with following solutions:
Initalizing smaller random weight values ( WeightRandom / PreviousLayerNeuronCount )
or
After calculation the sum of either outputs or gradients, dividing the sum with the number of 'neurons in previus layer for output sum' and number of 'neurons in next layer for gradient sum' and then passing sum into activation/derivative function.
I don't feel comfortable with solutions I came up with.
Solution 1. does not solve problem entirely. Possibility of gradient or output sum getting to high is still there. Solution 2. seems to solve the problem but I fear that it completely changes network behavior in a way that it might not solve some problems anymore.
What would you suggest me in this situation, keeping in mind that reducing neuron count in layers is not an option?
Thanks in advance!
General things that affect the output backpropagation include weights and biases of early elections, the number of hidden units, the amount of exercise patterns, and long iterations. As an alternative way, the selection of initial weights and biases there are several algorithms that can be used, one of which is an algorithm Nguyen widrow. You can use it to initialize the weights and biases early, I've tried it and gives good results.
i am experimenting with neural networks. I have a network with 8 input neurons, 5 hidden and 2 output. When i let the network learn with backpropagation, sometimes, it produces worse result between single iterations of training. What can be the cause? It should not be implementation error, because i even tried using implementation from Introduction to Neural Networks for Java and it does exactly the same.
Nothing is wrong. Back propagation is just a gradient optimization, and gradient methods do not have a guarantee of making error smaller in each iteration (you do have a guarantee that there exists a very small step size/learning rate which has such property, but in practise no way of finding it); furthermore you are probably updating weights after each sample making your training stochastic, which is even more "unstable" in this matter (as you do not really calculate the true gradient). However, if due to this, your method is not converging - think about proper scaling of your data as well as reducing the learning rate and probably adding the momentum term. These are just gradient-based optimization-related issues, not BP as such.
When you're using Haar-like features for your training data for an Adaboost algorithm, how do you build your data sets? Do you literally have to find thousands of positive and negative samples? There must be a more efficient way of doing this...
I'm trying to analyze images in matlab (not faces) and am relatively new to image processing.
Yes, you do need many positive and negative samples for training. This is especially true for Adaboost, which works by repeatedly resampling the training set. How many samples is enough is hard to say. But generally, the more the better, because that increases the chances of your training set being representative.
Also, it seems to me that your quest for efficiency is misplaced. Training is done ahead of time, presumably off-line. It is the efficiency of classifying unknown instances after the training is done, that people usually worry about.
Undoubtedly, more data, more information, better result. You should include more information as possible. However, one thing you may need care is the ratio of positive set to negative set. For logistic regression, the ratio should not be over 1:5, for adaboost, I'm not really sure with the result, but it will certainly change with the ratio (I tried before).
Yes we need many positive and negative samples for the training but the collection of those data is very tedious. But you can make it easy by taking videos instead of pictures and using ffmpeg to convert those videos into pictures. It will make the training part much easier.
The only reason to have kind of equal positive and negative samples is to avoid bias. Sometimes you might get high accuracy , but it completely fails to classify one category. To evaluate such methods precision/recall are more useful than accuracy.