How many units does the hidden layer have? - neural-network

I'm getting started in artificial neural networks programming. I would like to know if there is some sort of calculation to determine the exact number of units a hidden layer in a feedforward multilayer network should have according to the number of the inputs and outputs it has. For example, in the classic XOR function, there are 2 inputs and 1 output. How do I know the hidden layer might have 3 units?

Roughly speaking:
more linear problem => less hidden nodes, more non-linear => more hidden nodes.
more generalisation => less hidden nodes, less generalisation => more hidden nodes
accurate answer (at least for your training set) => more hidden nodes, approximate answer => less hidden nodes
FYI: in the case of xor, if both inputs are connected straight to the output then a single additional hidden node is required. If no input to output connections are allowed then two hidden nodes will be the minimum.
In answer to the question is there a formula giving the exact number of hidden nodes for problems in general - no.

No
The short, but correct, answer is that there isn't any definition of "the right amount" of hidden nodes in a layer. There are a few guide lines though, such as not using more hidden nodes in a given layer than there are input singals.
Configuring your network
The bottom line is that you have to calibrate the number of hidden nodes according to your particular dataset or problem instance. It is important to remeber that using as few hidden nodes as possible is favorable as this will ensure the network is generalized.

Related

How does one calculate the target outputs of neurons in hidden layers of a neural network?

In a simple single-layer network, it is easy to calculate the target outputs of neurons, as they are identical to the target outputs of the network itself. However, in a multiple-layer network, I am not quite sure how to calculate the targets for each individual neuron in the hidden layers, because they do not necessarily have a direct connection to the final output and are most likely not given in the training data. How would one find these values?
I would not be surprised if I am missing something and am going about this incorrectly, but I would like to know nonetheless. Thanks in advance for any and all input.
Taken from this great guide on pg. 18:
Calculate the Errors for the hidden layer neurons. Unlike the output layer we can’t
calculate these directly (because we don’t have a Target), so we Back Propagate
them from the output layer (hence the name of the algorithm). This is done by
taking the Errors from the output neurons and running them back through the
weights to get the hidden layer errors.
Or in other words, you don't. You propagate the activations from the input to the output, calculate the error of the output, then backpropagate the error from the output back to the input (thus the name of the algorithm).
In the unfortunate case that the link I posted goes down, it can be found by Googling "backpropagation algorithm 3".

Why is softmax not used in hidden layers [duplicate]

This question already has answers here:
Why use softmax only in the output layer and not in hidden layers?
(5 answers)
Closed 5 years ago.
I have read the answer given here. My exact question pertains to the accepted answer:
Variables independence : a lot of regularization and effort is put to keep your variables independent, uncorrelated and quite sparse. If you use softmax layer as a hidden layer - then you will keep all your nodes (hidden variables) linearly dependent which may result in many problems and poor generalization.
What are the complications that forgoing the variable independence in hidden layers arises? Please provide at least one example. I know hidden variable independence helps a lot in codifying the backpropogation but backpropogation can be codified for softmax as well (Please verify if or not i am correct in this claim. I seem to have gotten the equations right according to me. hence the claim).
Training issue: try to imagine that to make your network working better you have to make a part of activations from your hidden layer a little bit lower. Then - automaticaly you are making rest of them to have mean activation on a higher level which might in fact increase the error and harm your training phase.
I don't understand how you achieve that kind of flexibility even in sigmoid hidden neuron where you can fine tune the activation of a particular given neuron which is precisely what the gradient descent's job is. So why are we even worried about this issue. If you can implement the backprop rest will be taken care of by gradient descent. Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)
mathematical issue: by creating constrains on activations of your model you decrease the expressive power of your model without any logical explaination. The strive for having all activations the same is not worth it in my opinion.
Kindly explain what is being said here
Batch normalization: I understand this, No issues here
1/2. I don't think you have a clue of what the author is trying to say. Imagine a layer with 3 nodes. 2 of these nodes have an error responsibility of 0 with respect to the output error; so there is óne node that should be adjusted. So if you want to improve the output of node 0, then you immediately affect nodes 1 and 2 in that layer - possibly making the output even more wrong.
Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)
That is the definition of backpropagation. That is exactly what you want. Neural networks rely on activations (which are non-linear) to map a function.
3. Your basically saying to every neuron 'hey, your output cannot be higher than x, because some other neuron in this layer already has value y'. Because all neurons in a softmax layer should have a total activation of 1, it means that neurons cannot be higher than a specific value. For small layers - small problem, but for big layers - big problem. Imagine a layer with 100 neurons. Now imagine their total output should be 1. The average value of those neurons will be 0.01 -> that means you are making networks connection relying (because activations will stay very low, averagely) - as other activation functions output (or take on input) of range (0:1 / -1:1).

How is the number of hidden and output neurons calculated for a neural network?

I'm very new to neural networks but I am trying to create one for optical character recognition. I have 100 images of every number from 0-9 in the size of 24x14. The number of inputs for the neural network are 336 but I don't know how to get the number of hidden neurons and output neurons.
How do I calculate it?
While for the output neurons the number should be equal to the number of classes you want to discriminate, for the hidden layer, the size is not so straight forward to set and it is mainly dependent on the trade-off between complexity of the model and generalization capabilities (see https://en.wikipedia.org/wiki/Artificial_neural_network#Computational_power).
The answers to this question can help:
training feedforward neural network for OCR
The number of output neurons is simply your number of classes (unless you only have 2 classes and are not using the one-hot representation, in which case you can make do with just 2 output neuron).
The number of hidden layers, and subsequently number of hidden neurons is not as straightforward as you might think as a beginner. Every problem will have a different configuration that will work for it. You have to try multiple things out. Just keep this in mind though:
The more layers you add, the more complex your calculations become and hence, the slower your network will train.
One of the best and easiest practices is to keep the number of hidden neurons fixed in each layer.
Keep in mind what hidden neurons in each layer mean. The input layer is your starting features and each subsequent hidden layer is what you do with those features.
Think about your problem and the features you are using. If you are dealing with images, you might want a large number of neurons in your first hidden layer to break apart your features into smaller units.
Usually you results would not vary much when you increase the number of neurons to a certain extent. And you'll get used to this as you practice more. Just keep in mind the trade-offs you are making
Good luck :)

Artificial Neural Network layers

I have decided to try and make a reccognition system. And I want to start with pictures of say, 16x16 pixels. That will be 256 INPUT NEURONS.
Now, the output neurons is essensially how many results I want, so say I want to distinguish the letters A, B and C.
then i need 3 OUTPUT NEURONS right?
My question is, how can I know how many neurons I need in the hidden layer? And what was the purpose of them again? Is it how many character classes I want? Say, O and Q are quite simular, so thay both would lead to one hidden layer neuron who later tell them appart?
You're right about the input and output layers.
How can I know how many neurons I need in the hidden layer?
There's no concrete rule that says exactly how many units you need in the hidden layers of a neural network. There are some general guidelines though, which I'll quote from one of my answers on Cross Validated.
Number of input units: Dimension of features x(i)
Number of output units: Number of classes
Reasonable default is one hidden layer, or if > 1 hidden layer, have the same number of hidden units in every layer (usually the more the better, anywhere from about 1X to 4X the number of input units).
You also asked:
And what was the purpose of them again?
The hidden layer units just transform the inputs into values (using coefficients selected during training) that can be used by the output layer.
Is it how many character classes I want? Say, O and Q are quite similar, so thy both would lead to one hidden layer neuron who later tell them apart?
No, that's not right. The number of output units will be the same as the number of classes you want. Each output unit will correspond to one letter, and will say whether or not the input image is that letter (with some probability). The output unit with the highest probability is the one you select as the right letter.

Neural Network Approximation Function

I'm trying to test the efficiency of the Neural Networks as approximation functions.
The function I need to approximate has 5 inputs and 1 output, which structure should I use?
I have no idea on what criteria should be applied in order to decide the number of Hidden Layer and the number of Nodes for each layer.
Thank you in advance,
Regards
Giuseppe.
I always use a single hidden layer. Theoretically, there are no functions which can be approximated by 2 or more hidden layers that cannot be approximated with one. To make a single hidden layer more complex, add more hidden nodes.
Typically, the number of hidden nodes is varied to observe the effect on model performance (as measured by accuracy or whatever). Too few hidden nodes results in a worse fit due to underfitting (the neural network's output function is too simple, and misses important details in the data). Too many hidden nodes results in a worse fit due to overfitting (the neural network becomes so flexible that it chases every bit of noise in the data).
Note that for classification problems you need at least 2 hidden layers if you want to separate concave polygons.
I'm not sure how the number of hidden layers affects function approximation.