i need a way to train a neural network other than backpropagation - neural-network

This is an on-going venture and some details are purposefully obfuscated.
I have a box that has several inputs and one output. The output voltage changes as the input voltages are changed. The desirability of the output sequence cannot be evaluated until many states pass and a look back process is evaluated.
I want to design a neural network that takes a number of outputs from the box as input and produce the correct input settings for the box to produce the optimal next output.
I cannot train this network using backpropagation. How do I train this network?

Genetic algorithm would be a good candidate here. A chromosome could encode the weights of the neural network. After evaluation you assign a fitness value to the chromosomes based on their performance. Chromosomes with higher fitness value have a higher chance to reproduce, helping to generate better performing chromosomes in the next generation.
Encoding the weights is a relatively simple solution, more complex ones could even define the topology of the network.
You might find some additional helpful information here:
http://en.wikipedia.org/wiki/Neuroevolution

Hillclimbing is the simplest optimization algorithm to implement. Just randomly modify the weights, see if it does better, if not reset them and try again. It's also generally faster than genetic algorithms. However it is prone to getting stuck in local optima, so try running it several times and selecting the best result.

Related

Confusion in Backpropagation

I've started working on Forward and back propagation of neural networks. I've coded it as-well and works properly too. But i'm confused in the algorithm itself. I'm new to Neural Networks.
So Forward propagation of neural networks is finding the right label with the given weights?
and Back-propagation is using forward propagation to find the most error free parameters by minimizing cost function and using these parameters to help classify other training examples? And this is called a trained Neural Network?
I feel like there is a big blunder in my concept if there is please let me know where i'm wrong and why i am wrong.
I will try my best to explain forward and back propagation in a detailed yet simple to understand manner, although it's not an easy topic to do.
Forward Propagation
Forward propagation is the process in a neural network where-by during the runtime of the network, values are fed into the front of the neural network, (the inputs). You can imagine that these values then travel across the weights which multiply the original value from the inputs by themselves. They then arrive at the hidden layer (neurons). Neurons vary quite a lot based on different types of networks, but here is one way of explaining it. When the values reach the neuron they go through a function where every single value being fed into the neuron is summed up and then fed into an activation function. This activation function can be very different depending on the use-case but let's take for example a linear activation function. It essentially gets the value being fed into it and then it rounds it to a 0 or 1. It is then fed through more weights and then it is spat out into the outputs. Which is the last step into the network.
You can imagine this network with this diagram.
Back Propagation
Back propagation is just like forward propagation except we work backwards from where we were in forward propagation.
The aim of back propagation is to reduce the error in the training phase (trying to get the neural network as accurate as possible). The way this is done is by going backwards through the weights and layers. At each weight the error is calculated and each weight is individually adjusted using an optimization algorithm; optimization algorithm is exactly what it sounds like. It optimizes the weights and adjusts their values to make the neural network more accurate.
Some optimization algorithms include gradient descent and stochastic gradient descent. I will not go through the details in this answer as I have already explained them in some of my other answers (linked below).
The process of calculating the error in the weights and adjusting them accordingly is the back-propagation process and it is usually repeated many times to get the network as accurate as possible. The number of times you do this is called the epoch count. It is good to learn the importance of how you should manage epochs and batch sizes (another topic), as these can severely impact the efficiency and accuracy of your network.
I understand that this answer may be hard to follow, but unfortunately this is the best way I can explain this. It is expected that you might not understand this the first time you read it, but remember this is a complicated topic. I have a linked a few more resources down below including a video (not mine) that explains these processes even better than a simple text explanation can. But I also hope my answer may have resolved your question and have a good day!
Further resources:
Link 1 - Detailed explanation of back-propagation.
Link 2 - Detailed explanation of stochastic/gradient-descent.
Youtube Video 1 - Detailed explanation of types of propagation.
Credits go to Sebastian Lague

How does neural network know which reward it got from action?

I am current working on making a Deep q-network and i a bit confused about how my Q-network knows which reward i give it.
For example I have this state action function with policy and temporal difference:
and then I have my Q-network:
Where I input my states and I get 4 different q values in the same observation. Theory wise how do I reward my Q-network because my only inputs are the state but not the reward.
I hope one can explain me this!
You should be familiar with training and inference.
In the training phase, you provide inputs and the desired outputs to the neural network. The exact way in which you encode the desired outputs can vary; one way is to define a reward function. The weights adjustment procedure is then defined to optimize the reward
In production, the network is used for inference. You now use it to predict the unknown outcomes, but you don't update the weights. Therefore, you don't have a reward function in this phase.
This makes neural networks a form of supervised learning. If you need unsupervised learning, you generally have a bigger problem, and might need different algorithms. One sort-of exception is when you can automatically evaluate the quality of your predictions in hindsight. An example of this is the branch predictor of CPU's; this can be trained using the actual data from branches taken.

Using a learned Artificial Neural Network to solve inputs

I've recently been delving into artificial neural networks again, both evolved and trained. I had a question regarding what methods, if any, to solve for inputs that would result in a target output set. Is there a name for this? Everything I try to look for leads me to backpropagation which isn't necessarily what I need. In my search, the closest thing I've come to expressing my question is
Is it possible to run a neural network in reverse?
Which told me that there, indeed, would be many solutions for networks that had varying numbers of nodes for the layers and they would not be trivial to solve for. I had the idea of just marching toward an ideal set of inputs using the weights that have been established during learning. Does anyone else have experience doing something like this?
In order to elaborate:
Say you have a network with 401 input nodes which represents a 20x20 grayscale image and a bias, two hidden layers consisting of 100+25 nodes, as well as 6 output nodes representing a classification (symbols, roman numerals, etc).
After training a neural network so that it can classify with an acceptable error, I would like to run the network backwards. This would mean I would input a classification in the output that I would like to see, and the network would imagine a set of inputs that would result in the expected output. So for the roman numeral example, this could mean that I would request it to run the net in reverse for the symbol 'X' and it would generate an image that would resemble what the net thought an 'X' looked like. In this way, I could get a good idea of the features it learned to separate the classifications. I feel as it would be very beneficial in understanding how ANNs function and learn in the grand scheme of things.
For a simple feed-forward fully connected NN, it is possible to project hidden unit activation into pixel space by taking inverse of activation function (for example Logit for sigmoid units), dividing it by sum of incoming weights and then multiplying that value by weight of each pixel. That will give visualization of average pattern, recognized by this hidden unit. Summing up these patterns for each hidden unit will result in average pattern, that corresponds to this particular set of hidden unit activities.Same procedure can be in principle be applied to to project output activations into hidden unit activity patterns.
This is indeed useful for analyzing what features NN learned in image recognition. For more complex methods you can take a look at this paper (besides everything it contains examples of patterns that NN can learn).
You can not exactly run NN in reverse, because it does not remember all information from source image - only patterns that it learned to detect. So network cannot "imagine a set inputs". However, it possible to sample probability distribution (taking weight as probability of activation of each pixel) and produce a set of patterns that can be recognized by particular neuron.
I know that you can, and I am working on a solution now. I have some code on my github here for imagining the inputs of a neural network that classifies the handwritten digits of the MNIST dataset, but I don't think it is entirely correct. Right now, I simply take a trained network and my desired output and multiply backwards by the learned weights at each layer until I have a value for inputs. This is skipping over the activation function and may have some other errors, but I am getting pretty reasonable images out of it. For example, this is the result of the trained network imagining a 3: number 3
Yes, you can run a probabilistic NN in reverse to get it to 'imagine' inputs that would match an output it's been trained to categorise.
I highly recommend Geoffrey Hinton's coursera course on NN's here:
https://www.coursera.org/course/neuralnets
He demonstrates in his introductory video a NN imagining various "2"s that it would recognise having been trained to identify the numerals 0 through 9. It's very impressive!
I think it's basically doing exactly what you're looking to do.
Gruff

Optimization of Neural Network input data

I'm trying to build an app to detect images which are advertisements from the webpages. Once I detect those I`ll not be allowing those to be displayed on the client side.
Basically I'm using Back-propagation algorithm to train the neural network using the dataset given here: http://archive.ics.uci.edu/ml/datasets/Internet+Advertisements.
But in that dataset no. of attributes are very high. In fact one of the mentors of the project told me that If you train the Neural Network with that many attributes, it'll take lots of time to get trained. So is there a way to optimize the input dataset? Or I just have to use that many attributes?
1558 is actually a modest number of features/attributes. The # of instances(3279) is also small. The problem is not on the dataset side, but on the training algorithm side.
ANN is slow in training, I'd suggest you to use a logistic regression or svm. Both of them are very fast to train. Especially, svm has a lot of fast algorithms.
In this dataset, you are actually analyzing text, but not image. I think a linear family classifier, i.e. logistic regression or svm, is better for your job.
If you are using for production and you cannot use open source code. Logistic regression is very easy to implement compared to a good ANN and SVM.
If you decide to use logistic regression or SVM, I can future recommend some articles or source code for you to refer.
If you're actually using a backpropagation network with 1558 input nodes and only 3279 samples, then the training time is the least of your problems: Even if you have a very small network with only one hidden layer containing 10 neurons, you have 1558*10 weights between the input layer and the hidden layer. How can you expect to get a good estimate for 15580 degrees of freedom from only 3279 samples? (And that simple calculation doesn't even take the "curse of dimensionality" into account)
You have to analyze your data to find out how to optimize it. Try to understand your input data: Which (tuples of) features are (jointly) statistically significant? (use standard statistical methods for this) Are some features redundant? (Principal component analysis is a good stating point for this.) Don't expect the artificial neural network to do that work for you.
Also: remeber Duda&Hart's famous "no-free-lunch-theorem": No classification algorithm works for every problem. And for any classification algorithm X, there is a problem where flipping a coin leads to better results than X. If you take this into account, deciding what algorithm to use before analyzing your data might not be a smart idea. You might well have picked the algorithm that actually performs worse than blind guessing on your specific problem! (By the way: Duda&Hart&Storks's book about pattern classification is a great starting point to learn about this, if you haven't read it yet.)
aplly a seperate ANN for each category of features
for example
457 inputs 1 output for url terms ( ANN1 )
495 inputs 1 output for origurl ( ANN2 )
...
then train all of them
use another main ANN to join results

Neural Net Optimize w/ Genetic Algorithm

Is a genetic algorithm the most efficient way to optimize the number of hidden nodes and the amount of training done on an artificial neural network?
I am coding neural networks using the NNToolbox in Matlab. I am open to any other suggestions of optimization techniques, but I'm most familiar with GA's.
Actually, there are multiple things that you can optimize using GA regarding NN.
You can optimize the structure (number of nodes, layers, activation function etc.).
You can also train using GA, that means setting the weights.
Genetic algorithms will never be the most efficient, but they usually used when you have little clue as to what numbers to use.
For training, you can use other algorithms including backpropagation, nelder-mead etc..
You said you wanted to optimize number hidden nodes, for this, genetic algorithm may be sufficient, although far from "optimal". The space you are searching is probably too small to use genetic algorithms, but they can still work and afaik, they are already implemented in matlab, so no biggie.
What do you mean by optimizing amount of training done? If you mean number of epochs, then that's fine, just remember that training is somehow dependent on starting weights and they are usually random, so the fitness function used for GA won't really be a function.
A good example of neural networks and genetic programming is the NEAT architecture (Neuro-Evolution of Augmenting Topologies). This is a genetic algorithm that finds an optimal topology. It's also known to be good at keeping the number of hidden nodes down.
They also made a game using this called Nero. Quite unique and very amazing tangible results.
Dr. Stanley's homepage:
http://www.cs.ucf.edu/~kstanley/
Here you'll find just about everything NEAT related as he is the one who invented it.
Genetic algorithms can be usefully applied to optimising neural networks, but you have to think a little about what you want to do.
Most "classic" NN training algorithms, such as Back-Propagation, only optimise the weights of the neurons. Genetic algorithms can optimise the weights, but this will typically be inefficient. However, as you were asking, they can optimise the topology of the network and also the parameters for your training algorithm. You'll have to be especially wary of creating networks that are "over-trained" though.
One further technique with a modified genetic algorithms can be useful for overcoming a problem with Back-Propagation. Back-Propagation usually finds local minima, but it finds them accurately and rapidly. Combining a Genetic Algorithm with Back-Propagation, e.g., in a Lamarckian GA, gives the advantages of both. This technique is briefly described during the GAUL tutorial
It is sometimes useful to use a genetic algorithm to train a neural network when your objective function isn't continuous.
I'm not sure whether you should use a genetic algorithm for this.
I suppose the initial solution population for your genetic algorithm would consist of training sets for your neural network (given a specific training method). Usually the initial solution population consists of random solutions to your problem. However, random training sets would not really train your neural network.
The evaluation algorithm for your genetic algorithm would be a weighed average of the amount of training needed, the quality of the neural network in solving a specific problem and the numer of hidden nodes.
So, if you run this, you would get the training set that delivered the best result in terms of neural network quality (= training time, number hidden nodes, problem solving capabilities of the network).
Or are you considering an entirely different approach?
I'm not entirely sure what kind of problem you're working with, but GA sounds like a little bit of overkill here. Depending on the range of parameters you're working with, an exhaustive (or otherwise unintelligent) search may work. Try plotting your NN's performance with respect to number of hidden nodes for a first few values, starting small and jumping by larger and larger increments. In my experience, many NNs plateau in performance surprisingly early; you may be able to get a good picture of what range of hidden node numbers makes the most sense.
The same is often true for NNs' training iterations. More training helps networks up to a point, but soon ceases to have much effect.
In the majority of cases, these NN parameters don't affect performance in a very complex way. Generally, increasing them increases performance for a while but then diminishing returns kick in. GA is not really necessary to find a good value on this kind of simple curve; if the number of hidden nodes (or training iterations) really does cause the performance to fluctuate in a complicated way, then metaheuristics like GA may be apt. But give the brute-force approach a try before taking that route.
I would tend to say that genetic algorithms is a good idea since you can start with a minimal solution and grow the number of neurons. It is very likely that the "quality function" for which you want to find the optimal point is smooth and has only few bumps.
If you have to find this optimal NN frequently I would recommend using optimization algorithms and in your case quasi newton as described in numerical recipes which is optimal for problems where the function is expensive to evaluate.