How are the bias neurons created in NEAT? - neural-network

I am trying to implement simple NEAT. I read from various sources that there are 4 types of "nodes": input neurons, hidden neurons, output neurons and so-called bias neurons. I don't see which process may create bias neurons, that are depicted in this paper at page 16.
I understand that new neurons may be created while mutating, but it requires an existing connection between two neurons that will be split by this new neuron (basing on paper already mentioned, page 10). However, bias neuron has no "input" connection, so it clearly can't be created in the mentioned way. Then how, in details, does NEAT create bias neurons?

A bias neuron (node) in the NEAT context is simply a special input neuron that is always active. It is always included by construction since it seems to help evolution in many cases.
So, in short, you do not create bias neurons just like you won't create new input or output nodes; these are defined by your problem.
You are correct in that the standard NEAT implementation introduces new hidden nodes by splitting existing connections. Hidden nodes are the only neurons you will create or destroy in NEAT (and, to my knowledge, in neuroevolution in general).

Related

Why is softmax not used in hidden layers [duplicate]

This question already has answers here:
Why use softmax only in the output layer and not in hidden layers?
(5 answers)
Closed 5 years ago.
I have read the answer given here. My exact question pertains to the accepted answer:
Variables independence : a lot of regularization and effort is put to keep your variables independent, uncorrelated and quite sparse. If you use softmax layer as a hidden layer - then you will keep all your nodes (hidden variables) linearly dependent which may result in many problems and poor generalization.
What are the complications that forgoing the variable independence in hidden layers arises? Please provide at least one example. I know hidden variable independence helps a lot in codifying the backpropogation but backpropogation can be codified for softmax as well (Please verify if or not i am correct in this claim. I seem to have gotten the equations right according to me. hence the claim).
Training issue: try to imagine that to make your network working better you have to make a part of activations from your hidden layer a little bit lower. Then - automaticaly you are making rest of them to have mean activation on a higher level which might in fact increase the error and harm your training phase.
I don't understand how you achieve that kind of flexibility even in sigmoid hidden neuron where you can fine tune the activation of a particular given neuron which is precisely what the gradient descent's job is. So why are we even worried about this issue. If you can implement the backprop rest will be taken care of by gradient descent. Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)
mathematical issue: by creating constrains on activations of your model you decrease the expressive power of your model without any logical explaination. The strive for having all activations the same is not worth it in my opinion.
Kindly explain what is being said here
Batch normalization: I understand this, No issues here
1/2. I don't think you have a clue of what the author is trying to say. Imagine a layer with 3 nodes. 2 of these nodes have an error responsibility of 0 with respect to the output error; so there is óne node that should be adjusted. So if you want to improve the output of node 0, then you immediately affect nodes 1 and 2 in that layer - possibly making the output even more wrong.
Fine tuning the weights so as to make the activations proper is not something you, even if you could do, which you cant, would want to do. (Kindly correct me if my understanding is wrong here)
That is the definition of backpropagation. That is exactly what you want. Neural networks rely on activations (which are non-linear) to map a function.
3. Your basically saying to every neuron 'hey, your output cannot be higher than x, because some other neuron in this layer already has value y'. Because all neurons in a softmax layer should have a total activation of 1, it means that neurons cannot be higher than a specific value. For small layers - small problem, but for big layers - big problem. Imagine a layer with 100 neurons. Now imagine their total output should be 1. The average value of those neurons will be 0.01 -> that means you are making networks connection relying (because activations will stay very low, averagely) - as other activation functions output (or take on input) of range (0:1 / -1:1).

Can a convolutional neural network be built with perceptrons?

I was reading this interesting article on convolutional neural networks. It showed this image, explaining that for every receptive field of 5x5 pixels/neurons, a value for a hidden value is calculated.
We can think of max-pooling as a way for the network to ask whether a given feature is found anywhere in a region of the image. It then throws away the exact positional information.
So max-pooling is applied.
With multiple convolutional layers, it looks something like this:
But my question is, this whole architecture could be build with perceptrons, right?
For every convolutional layer, one perceptron is needed, with layers:
input_size = 5x5;
hidden_size = 10; e.g.
output_size = 1;
Then for every receptive field in the original image, the 5x5 area is inputted into a perceptron to output the value of a neuron in the hidden layer. So basically doing this for every receptive field:
So the same perceptron is used 24x24 amount of times to construct the hidden layer, because:
is that we're going to use the same weights and bias for each of the 24×24 hidden neurons.
And this works for the hidden layer to the pooling layer as well, input_size = 2x2; output_size = 1;. And in the case of a max-pool layer, it's just a max() function on an array.
and then finally:
The final layer of connections in the network is a fully-connected
layer. That is, this layer connects every neuron from the max-pooled
layer to every one of the 10 output neurons.
which is a perceptron again.
So my final architecture looks like this:
-> 1 perceptron for every convolutional layer/feature map
-> run this perceptron for every receptive field to create feature map
-> 1 perceptron for every pooling layer
-> run this perceptron for every field in the feature map to create a pooling layer
-> finally input the values of the pooling layer in a regular ALL to ALL perceptron
Or am I overseeing something? Or is this already how they are programmed?
The answer very much depends on what exactly you call a Perceptron. Common options are:
Complete architecture. Then no, simply because it's by definition a different NN.
A model of a single neuron, specifically y = 1 if (w.x + b) > 0 else 0, where x is the input of the neuron, w and b are its trainable parameters and w.b denotes the dot product. Then yes, you can force a bunch of these perceptrons to share weights and call it a CNN. You'll find variants of this idea being used in binary neural networks.
A training algorithm, typically associated with the Perceptron architecture. This would make no sense to the question, because the learning algorithm is in principle orthogonal to the architecture. Though you cannot really use the Perceptron algorithm for anything with hidden layers, which would suggest no as the answer in this case.
Loss function associated with the original Perceptron. This notion of Peceptron is orthogonal to the problem at hand, you're loss function with a CNN is given by whatever you try to do with your whole model. You can eventually use it, but it is non-differentiable, so good luck :-)
A sidenote rant: You can see people refer to feed-forward, fully-connected NNs with hidden layers as "Multilayer Perceptrons" (MLPs). This is a misnomer, there are no Perceptrons in MLPs, see e.g. this discussion on Wikipedia -- unless you go explore some really weird ideas. It would make sense call these networks as Multilayer Linear Logistic Regression, because that's what they used to be composed of. Up till like 6 years ago.

How is the number of hidden and output neurons calculated for a neural network?

I'm very new to neural networks but I am trying to create one for optical character recognition. I have 100 images of every number from 0-9 in the size of 24x14. The number of inputs for the neural network are 336 but I don't know how to get the number of hidden neurons and output neurons.
How do I calculate it?
While for the output neurons the number should be equal to the number of classes you want to discriminate, for the hidden layer, the size is not so straight forward to set and it is mainly dependent on the trade-off between complexity of the model and generalization capabilities (see https://en.wikipedia.org/wiki/Artificial_neural_network#Computational_power).
The answers to this question can help:
training feedforward neural network for OCR
The number of output neurons is simply your number of classes (unless you only have 2 classes and are not using the one-hot representation, in which case you can make do with just 2 output neuron).
The number of hidden layers, and subsequently number of hidden neurons is not as straightforward as you might think as a beginner. Every problem will have a different configuration that will work for it. You have to try multiple things out. Just keep this in mind though:
The more layers you add, the more complex your calculations become and hence, the slower your network will train.
One of the best and easiest practices is to keep the number of hidden neurons fixed in each layer.
Keep in mind what hidden neurons in each layer mean. The input layer is your starting features and each subsequent hidden layer is what you do with those features.
Think about your problem and the features you are using. If you are dealing with images, you might want a large number of neurons in your first hidden layer to break apart your features into smaller units.
Usually you results would not vary much when you increase the number of neurons to a certain extent. And you'll get used to this as you practice more. Just keep in mind the trade-offs you are making
Good luck :)

Back propagation with a simple ANN

I watched a lecture and derived equations for back propagation, but it was in a simple example with 3 neurons: an input neuron, one hidden neuron, and an output neuron. This was easy to derive, but how would I do the same with more neurons? I'm not talking about adding more layers, I'm just talking about adding more neurons to the already existing three layers: the input, hidden, and output layer.
My first guess would be to use the equations I've derived for the network with just 3 neurons and 3 layers and iterate across all possible paths to each of the output neurons in the larger network, updating each weight. However, this would cause certain weights to be updated more than once. Can I just do this or is there a better method?
If you want to larn more about backpropagation I recommend you to read this link from Standford University http://cs231n.github.io/optimization-2/, it will really help you to understand backprop and all the math underneath.

ANN bypassing hidden layer for an input

I have just been set an assignment to calculate some ANN outputs and write an ANN. Simple stuff, done it before, so I don't need any help with general ANN stuff. However, there is something that is puzzling me. In the assignment, the topology is as follows(wont upload the diagram as it is his intellectual property):-
2 layers, 3 hiddens and one output.
Input x1 goes to 2 hidden nodes and the output node.
Input x2 goes to 2 hidden nodes.
The problem is the ever so usual XOR. He has NOT mentioned anything about this kind of topology before, and I have definitely attended each lecture and listened intently. I am a good student like that :)
I don't think this counts as homework as I need no help with the actual tasks in hand.
Any insight as to why one would use a network with a topology like this would be brilliant.
Regards
Does the neural net look like the above picture? It looks like a common XOR topology with one hidden layer and a bias neuron. The bias neuron basically helps you shift the values of the activation function to the left or the right.
For more information on the role of the bias neuron, take a look at the following answers:
Role of Bias in Neural Networks
XOR problem solvable with 2x2x1 neural network without bias?
Why is a bias neuron necessary for a backpropagating neural network that recognizes the XOR operator?
Update
I was able to find some literature about this. Apparently it is possible for an input to skip the hidden layer and go to the output layer. This is called a skip layer and is used to model traditional linear regression in a neural network. This page from the book Neural Network Modeling Using Sas Enterprise Miner describes the concept. This page from the same book goes into a little more detail about the concept as well.