I am trying to find the optimum number of neurons to use to run the Neural Net Fitting tool in Neural Networks Matlab app.
I am currently using 62000 samples of 64 elements as input and 62000 samples of 1 element as target. I tried to obtain similar results as in data obtained through other means, but the results are not even similar when trying to run the tool with 1-12 neurons. I tried running it with 64 neurons and the results were closer to what it was expected.
Is there any kind of way to know how many neurons to use based on the number of elements/samples?
Any suggestions on how to select the number of neurons when running the tests?
Thanks.
Even for simple datasets like MNIST I will at minimum use 128 neurons. Possible values to check are 128, 256, 512, and maybe 1024. These numbers are just easy to remember and are not magical nor the consequence of a known formula. Alternatively, choose a few random samples from [100, 500] and see which number of neurons worked best. Harder tasks tend to require more neurons, and when you have many neurons you need to consider regularizing your network with L_2 regularization or dropout.
Related
I just coded my first neural network and experimented a little with it... It's task is very simple: it should basically output the rounded number. It consists of one input neuron and one output neuron with 1 hidden layer consisting of 2 hidden neurons. At first I gave it about 2000 random generated training data sets.
When I gave it 3 hidden layers consisting of 10 hidden neurons. The results started to get worse and even after 10000 training sets it still output many wrong answers. The neural network with 2 hidden neurons worked way better.
Why does this happen? I thought the more neurons a neural network had, the better it would be...
So how do you find the best number of neurons and hiddenlayers?
If by "worse" you mean less accuracy on a test set, the problem is most likely overfitting.
In general, I can tell you this: More layers will fit more complicated functions on data. Maybe your data resembles a lot a straight line, so a simple linear function will do great. But imagine you try fitting a 6-th degree polynomial to the data. As you may know, high degree even functions go to infinity (+-) very fast, so this high degree model will predict too large values at the extremes.
In summary, your problem is most likely overfitting (high variance). You can check out several more intuitive explanations on the bias-variance tradeoff somewhere with graphs.
quick google search: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
I've made digit recognition (56x56 digits) using Neural Networks, but I'm getting 89.5% accuracy on test set and 100% on training set. I know that it's possible to get >95% on test set using this training set. Is there any way to improve my training so I can get better predictions? Changing iterations from 300 to 1000 gave me +0.12% accuracy. I'm also file size limited so increasing number of nodes can be impossible, but if that's the case maybe I could cut some pixels/nodes from the input layer.
To train I'm using:
input layer: 3136 nodes
hidden layer: 220 nodes
labels: 36
regularized cost function with lambda=0.1
fmincg to calculate weights (1000 iterations)
As mentioned in the comments, the easiest and most promising way is to switch to a Convolutional Neural Network. But with you current model you can:
Add more layers with less neurons each, which increases learning capacity and should increase accuracy by a bit. Problem is that you might start overfitting. Use regularization to counter this.
Use batch Normalization (BN). While you are already using regularization, BN accelerates training and also does regularization, and is a NN specific algorithm that might work better.
Make an ensemble. Train several NNs on the same dataset, but with a different initialization. This will produce slightly different classifiers and you can combine their output to get a small increase in accuracy.
Cross-entropy loss. You don't mention what loss function you are using, if its not Cross-entropy, then you should start using it. All the high accuracy classifiers use cross-entropy loss.
Switch to backpropagation and Stochastic Gradient Descent. I do not know the effect of using a different optimization algorithm, but backpropagation might outperform the optimization algorithm you are currently using, and you could combine this with other optimizers such as Adagrad or ADAM.
Other small changes that might increase accuracy are changing the activation functions (like ReLU), shuffle training samples after every epoch, and do data augmentation.
I am designing an algorithm for OCR using a neural network. I have 100 images([40x20] matrix) of each character so my input should be 2600x800. I have some question regarding the inputs and target.
1) is my input correct? and can all the 2600 images used in random order?
2) what should be the target? do I have to define the target for all 2600 inputs?
3) as the target for the same character is single, what is the final target vector?
(26x800) or (2600x800)?
Your input should be correct. You have (I am guessing) 26 characters and 100 images of size 800 for each, therefore the matrix looks good. As a side note, that looks pretty big input size, you may want to consider doing PCA and using the eigenvalues for training or just reduce the size of the images. I have been able to train NN with 10x10 images, but bigger== more difficult. Try, and if it doesn't work try doing PCA.
(and 3) Of course, if you want to train a NN you need to give it inputs with outputs, how else re you going to train it? You ourput should be of size 26x1 for each of the images, therefore the output for training should be 2600x26. In each of the outputs you should have 1 for the character index it belongs and zero in the rest.
I am having a problem in a project which uses pybrain(a python library for neural network)
to build an ANN and do regression as prediction.
I am using 3-layer ANN, with 14 inputs, 10 hidden neurons in the hidden layer, and 2 outputs. A typical training or test example would be like this,
Inputs(divided by space):
1534334.489 1554790.856 1566060.675 20 20 20 50 45000 -11.399025 13 1.05E-03 1.775475116 20 0
Outputs(divided by space):
1571172.296 20
And I am using pybrain's BackpropTrainer so it is training using Backpropagation, and I trained until convergence.
The weird thing of the result is that the prediction of the first output(e.g. the first output of the trained ANN using test inputs) tracks the real value well in lower parts of the curve but seems to have an unwanted upperbound when real value rises.
I changed the number of hidden neurons to 10 but it still behaves like this. Even if I tested the trained ANN using the original training samples, it would still have an upperbound like this.
Does anyone have an intuition or advice on what's wrong here? Thanks!
Try to normalize the values(input and output) between (-1, +1).
I use a neural network with 3 layers for categorization problem: 1) ~2k neurons 2) ~2k neurons 3) 20 neurons. My training set consists of 2 examples, most of the inputs in each example are zeros. For some reason after the backpropagation training the network gives virtually the same output for both examples (which is either valid for only 1 of examples or have 1.0 for outputs where one of example has 1s). It comes to this state after the first epoch and doesn't change much afterwards, even if learning rate is minimal double vale. I use sigmoid as activation function.
I thought it could be something wrong with my code so I've used AForge open source library, and seems like it suffers from the same issue.
What might be the problem here?
Solution: I've removed one layer and decreased the number of neurons in hidden layer to 800
2000 by 2000 by 20 is huge. That's approximately 4 million weights to determine, meaning the algorithm has to search a 4-million-dimensional space. Any optimization algorithm will be totally at a loss in this case. I'm assuming you're using gradient descent, which is not even that powerful, so likely the algorithm is stuck in a local optimum somewhere in this gigantic search space.
Simplify your model!
Added:
And please also describe in more detail what you're trying to do. Do you really have only 2 training examples? That's like trying to categorize 2 points using a 4-million-dimensional plane. It doesn't make sense to me.
You mentioned that most of the inputs are zero. To your reduce the size of your search space, try removing redundancy in your training examples. For instance if
trainingExample[0].inputValue[i] == trainingExample[1].inputValue[i]
then x.inputValue[i] has no information bearing data for the NN.
Also, perhaps it's not clear, but it seems that two training examples seem small.