ANN bypassing hidden layer for an input - neural-network

I have just been set an assignment to calculate some ANN outputs and write an ANN. Simple stuff, done it before, so I don't need any help with general ANN stuff. However, there is something that is puzzling me. In the assignment, the topology is as follows(wont upload the diagram as it is his intellectual property):-
2 layers, 3 hiddens and one output.
Input x1 goes to 2 hidden nodes and the output node.
Input x2 goes to 2 hidden nodes.
The problem is the ever so usual XOR. He has NOT mentioned anything about this kind of topology before, and I have definitely attended each lecture and listened intently. I am a good student like that :)
I don't think this counts as homework as I need no help with the actual tasks in hand.
Any insight as to why one would use a network with a topology like this would be brilliant.
Regards

Does the neural net look like the above picture? It looks like a common XOR topology with one hidden layer and a bias neuron. The bias neuron basically helps you shift the values of the activation function to the left or the right.
For more information on the role of the bias neuron, take a look at the following answers:
Role of Bias in Neural Networks
XOR problem solvable with 2x2x1 neural network without bias?
Why is a bias neuron necessary for a backpropagating neural network that recognizes the XOR operator?
Update
I was able to find some literature about this. Apparently it is possible for an input to skip the hidden layer and go to the output layer. This is called a skip layer and is used to model traditional linear regression in a neural network. This page from the book Neural Network Modeling Using Sas Enterprise Miner describes the concept. This page from the same book goes into a little more detail about the concept as well.

Related

Back propagation with a simple ANN

I watched a lecture and derived equations for back propagation, but it was in a simple example with 3 neurons: an input neuron, one hidden neuron, and an output neuron. This was easy to derive, but how would I do the same with more neurons? I'm not talking about adding more layers, I'm just talking about adding more neurons to the already existing three layers: the input, hidden, and output layer.
My first guess would be to use the equations I've derived for the network with just 3 neurons and 3 layers and iterate across all possible paths to each of the output neurons in the larger network, updating each weight. However, this would cause certain weights to be updated more than once. Can I just do this or is there a better method?
If you want to larn more about backpropagation I recommend you to read this link from Standford University http://cs231n.github.io/optimization-2/, it will really help you to understand backprop and all the math underneath.

Using a learned Artificial Neural Network to solve inputs

I've recently been delving into artificial neural networks again, both evolved and trained. I had a question regarding what methods, if any, to solve for inputs that would result in a target output set. Is there a name for this? Everything I try to look for leads me to backpropagation which isn't necessarily what I need. In my search, the closest thing I've come to expressing my question is
Is it possible to run a neural network in reverse?
Which told me that there, indeed, would be many solutions for networks that had varying numbers of nodes for the layers and they would not be trivial to solve for. I had the idea of just marching toward an ideal set of inputs using the weights that have been established during learning. Does anyone else have experience doing something like this?
In order to elaborate:
Say you have a network with 401 input nodes which represents a 20x20 grayscale image and a bias, two hidden layers consisting of 100+25 nodes, as well as 6 output nodes representing a classification (symbols, roman numerals, etc).
After training a neural network so that it can classify with an acceptable error, I would like to run the network backwards. This would mean I would input a classification in the output that I would like to see, and the network would imagine a set of inputs that would result in the expected output. So for the roman numeral example, this could mean that I would request it to run the net in reverse for the symbol 'X' and it would generate an image that would resemble what the net thought an 'X' looked like. In this way, I could get a good idea of the features it learned to separate the classifications. I feel as it would be very beneficial in understanding how ANNs function and learn in the grand scheme of things.
For a simple feed-forward fully connected NN, it is possible to project hidden unit activation into pixel space by taking inverse of activation function (for example Logit for sigmoid units), dividing it by sum of incoming weights and then multiplying that value by weight of each pixel. That will give visualization of average pattern, recognized by this hidden unit. Summing up these patterns for each hidden unit will result in average pattern, that corresponds to this particular set of hidden unit activities.Same procedure can be in principle be applied to to project output activations into hidden unit activity patterns.
This is indeed useful for analyzing what features NN learned in image recognition. For more complex methods you can take a look at this paper (besides everything it contains examples of patterns that NN can learn).
You can not exactly run NN in reverse, because it does not remember all information from source image - only patterns that it learned to detect. So network cannot "imagine a set inputs". However, it possible to sample probability distribution (taking weight as probability of activation of each pixel) and produce a set of patterns that can be recognized by particular neuron.
I know that you can, and I am working on a solution now. I have some code on my github here for imagining the inputs of a neural network that classifies the handwritten digits of the MNIST dataset, but I don't think it is entirely correct. Right now, I simply take a trained network and my desired output and multiply backwards by the learned weights at each layer until I have a value for inputs. This is skipping over the activation function and may have some other errors, but I am getting pretty reasonable images out of it. For example, this is the result of the trained network imagining a 3: number 3
Yes, you can run a probabilistic NN in reverse to get it to 'imagine' inputs that would match an output it's been trained to categorise.
I highly recommend Geoffrey Hinton's coursera course on NN's here:
https://www.coursera.org/course/neuralnets
He demonstrates in his introductory video a NN imagining various "2"s that it would recognise having been trained to identify the numerals 0 through 9. It's very impressive!
I think it's basically doing exactly what you're looking to do.
Gruff

Neural network value calculation?

I have 3 neurons x1, x2, x3. Now I know my value is being overflowed by the actual result value at output (as it is wrong answer) and my weights need new value, but how much value to be set for each neuron ? How to calculate that ?
One way is to divide the (desired value - output value) / 3 and assign the answer to each neuron ... but it won't work for new input, as no proper learning is made.
From your question it seems you do not understand yet how really neural networks work.
First of all, neural networks are a class of algorithms that fall under machine learning techniques. Therefore, they learn, either unsupervised, supervised or in a reinforcement type of training. This of course require a learning paradigm. In neural networks the most well studied supervised training is the backpropagation method. However, to understand how this work you first need to understand how a network is developed.
A description of what is a neural network and its foundations can be seen here: http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html
One practical explanation how you can implement a functional network through backpropagation can be seen here: http://galaxy.agh.edu.pl/~vlsi/AI/backp_t_en/backprop.html
If you read these you will probably know enough to answer your question.

Is there a rule/good advice on how big a artificial neural network should be?

My last lecture on ANN's was a while ago but I'm currently facing a project where I would want to use one.
So, the basics - like what type (a mutli-layer feedforward network), trained by an evolutionary algorithm (thats a given by the project), how many input-neurons (8) and how many ouput-neurons (7) - are set.
But I'm currently trying to figure out how many hidden layers I should use and how many neurons in each of these layers (the ea doesn't modify the network itself, but only the weights).
Is there a general rule or maybe a guideline on how to figure this out?
The best approach for this problem is to implement the cascade correlation algorithm, in which hidden nodes are sequentially added as necessary to reduce the error rate of the network. This has been demonstrated to be very useful in practice.
An alternative, of course, is a brute-force test of various values. I don't think simple answers such as "10 or 20 is good" are meaningful because you are directly addressing the separability of the data in high-dimensional space by the basis function.
A typical neural net relies on hidden layers in order to converge on a particular problem solution. A hidden layer of about 10 neurons is standard for networks with few input and output neurons. However, a trial an error approach often works best. Since the neural net will be trained by a genetic algorithm the number of hidden neurons may not play a significant role especially in training since its the weights and biases on the neurons which would be modified by an algorithm like back propogation.
As rcarter suggests, trial and error might do fine, but there's another thing you could try.
You could use genetic algorithms in order to determine the number of hidden layers or and the number of neurons in them.
I did similar things with a bunch of random forests, to try and find the best number of trees, branches, and parameters given to each tree, etc.

How to implement Q-learning with a neural network?

I have created a neural network with 2 inputs nodes, 4 hidden nodes and 3 output nodes. The initial weights are random between -1 to 1. I used backpropagation method to update the network with TD error. However, the performance is not good.
I want to know, where the problem might be?
1. Is a bias node necessary?
2. Are eligibility traces necessary?
If anyone can provide me any sample code, I'm very grateful.
Yes, you should include the bias nodes, and yes you should use eligibility traces. The bias nodes just give one additional tunable parameter. Think of the neural network as a "function approximator" as described in Sutton and Barto's book (free online). If the neural network has parameters theta (a vector containing all of the weights in the network), then the Sarsa update is just (using LaTeX notation):
\delta_t = r_t + \gamma*Q(s_{t+1},a_{t+1},\theta_t) - Q(s_t,a_t, \theta_t)
\theta_{t+1} = \theta_t + \alpha*\delta_t*\frac{\partial Q(s,a,\theta)}{\partial \theta}
This is for any function approximator Q(s,a,\theta), which estimates Q(s,a) by tuning its parameters, \theta.
However, I must ask why you're doing this. If you're just trying to get Q learning working really well, then you should use the Fourier Basis instead of a neural network:
http://all.cs.umass.edu/pubs/2011/konidaris_o_t_11.pdf
If you really want to use a neural network for RL, then you should use a natural actor-critic (NAC). NACs follow something called the "natural gradient," which was developed by Amari specifically to speed up learning using neural networks, and it makes a huge difference.
We need more information. What is the problem domain. What are the inputs? What are the outputs?
RL can take a very long time to train and, depending on how you're training, can go from good to great to good to not-so-good during training. Therefore, you should plot the performance of your agent during learning, not just the end result.
You always should use bias nodes. Eligibility traces? Probably not.