How to create proper feedforward neural network with evolutionary algorithm - neural-network

I've created a 2D game where you use map editor to place cars, obstacles and destination point where cars should go.
The idea is that these cars will be controlled by generation of feedforward neural networks. But I'm not sure how information should be represented in input layer and how exactly evolution should be like, so I'll explain my idea and it would be great to have advice on how to make this idea better especially if something will not work at all.
Input layer values:
Neurons in between (0,1) representing distance to the obstacle
Neuron in between (-1,1) representing the speed of car and its direction (-1 - max speed backwards, 0 - no speed, 0.5 - half of max speed forward)
Two neurons in between (-1,1) representing cos and sin (or 2*asin/Pi and 2*(acos-pi/2)/Pi) of vector from car to destination relative to some canvas (map) constant axis.
Output layer values:
Neuron between (-1,1) representing acceleration of car and its direction
Neuron between (-1,1) representing where and how fast car will turn
Looking at values, I'm thinking about using tanh function everywhere. But is this a good idea to use a single negative/positive value to determine a direction (as input or output)? Or is it a good idea to use two neurons to tell to a neural network where it should go (obviously can't tell with single value) and etc?
I imagine evolution itself to be be about switching some weight and bias values mostly between best networks (depending on fitness) and adding small random numbers to some weights and biases to mutate (where random number abs will depend on fitness in order to avoid destructive huge changes in good networks).

Related

Meaning of Bias with zero inputs in Perception at ANNs

I'm student in a graduate computer science program. Yesterday we had a lecture about neural networks.
I think I understood the specific parts of a perceptron in neural networks with one exception. I already made my research about the bias in an perceptron- but still I didn't got it.
So far I know that, with the bias I can manipulate the sum over the inputs with there weights in a perception to evaluate that the sum minus a specific bias is bigger than the activation function threshold - if the function should fire (Sigmoid).
But on the presentation slides from my professor he mentioned something like this:
The bias is added to the perceptron to avoid issues where all inputs
could be equal to zero - no multiplicative weight would have an effect
I can't figure out whats the meaning behind this sentence and why is it important, that sum over all weighted inputs can't be equal to zero ?. If all inputs are equal to zero, there should be no impact on the next perceptions in the next hidden layer, right? Furthermore this perception is a static value for backpropagation and has no influence on changing this weights at the perception.
Or am I wrong?
Has anyone a solution for that?
thanks in advance
Bias
A bias is essentially an offset.
Imagine the simple case of a single perceptron, with a relationship between the input and the output, say:
y = 2x + 3
Without the bias term, the perceptron could match the slope (often called the weight) of "2", meaning it could learn:
y = 2x
but it could not match the "+ 3" part.
Although this is a simple example, this logic scales to neural networks in general. The neural network can capture nonlinear functions, but often it needs an offset to do so.
What you asked
What your professor said is another good example of why an offset would be needed. Imagine all the inputs to a perceptron are 0. A perceptron's output is the sum of each of the inputs multiplied by a weight. This means that each weight is being multiplied by 0, then added together. Therefore, the result will always be 0.
With a bias, however, the output could still retain a value.

modeling an relationship between sensor values and position (angle and distance) to a target

I want to derive a simple model that can predict a current position of an object with respect of a target.
To be more specific, I have a head that has 4 identical light sensors placed between 90 degree. There is a light source (LED) emitting visual light. Since each sensor has angle spectrum (maximum at 90 degree and decrease its sensitivity while the angle of the incident of light increases), the receiving value at each sensor is determined by the angle and distance of the head with respect of the target.
I measured the values at four sensors at various angles and distances.
Each sensor has maximum values around 9.5 when incoming light is low (either the sensor is far from the target or the sensor faces away the target), while the value decreases as the sensor gets close to the target or faces directly toward to the target.
my inputs and outputs look like
[0.1234 0.0124 8.342 9.232] = [angle, distance]: an example of the head placed toward next to the light.
four inputs from the sensors and two outputs for the angle and distance.
What strategy can I implement to derive an equation that I can use for predicting the angle and distance by putting current incoming sensor values?
I was thinking of multivariate regression, but my outputs are not a single scalar (more of vectors). I am not sure it will work.
Therefore, I am writing here for asking some help.
Any help would be appreciated.
Thanks
Your idea about multivariate regression looks reasonable.
IMHO you need to train two models instead of one. The first one will predict angle, and the second one will predict distance.
Why you want to combine these two models? This is looks strange in the sense of the optimization metric. When you build angle model you minimize the error in radians. When you build distance model you minimize the error in meters. So what the metric you will minimize in single model case?
I believe next links will be useful for you:
https://www.mathworks.com/help/curvefit/surface-fitting.html
https://www.mathworks.com/help/matlab/math/example-curve-fitting-via-optimization.html
Note: in some cases the data normalization (for example via zscore) greatly increases the fitting performance.
P.S. Try also ask at the https://stats.stackexchange.com/

2-output neural network?

I've been thinking about this for a while but I cant seem to find any data on it. When classifying with a neural network you usually assign regions of the output neuron's activation function to a specific class, e.g. For tanh you could set 0.8 for class 1 and -0.8 for class 2. This is all well and good if you have up to 3 classes (the third class can be around zero), but when you have more classes things can become tricky.
Take an example where you are classifying football players based on their statistics. An attacking midfield player and a striker have similar statistics, but if you assign them to regions on opposite sides of the activation function, the accuracy of the classifier is surely harmed.
Would it not be easier to have a 2-output neural network that outputs an arbitrary x and a y value such that the class regions could be represented in 2D rather than 1D? You could essentially have a circle, cut into the number of classes you want and have the centre of each slice as the target value for the class. This seems like a good way to classify to me but the lack of relevant data on the subject is leading me to believe there are easier ways to perform classification with a higher number of classes (say 6 classes for example). The reason I ask is because I am trying to classify football players in certain positions based on their stats. You can see a scatter plot of the top 2 principal component scores for players below.
The usual approach is to use one neuron for every class. You will then find the answer with "argmax".
You don't gain much by encoding 2 or 3 values with a single neuron.

Training data range for Neural Network

Is it better for Neural Network to use smaller range of training data or it does not matter? For example, if I want to train an ANN with angles (values of float) should I pass those values in degrees [0; 360] or in radians [0; 6.28] or maybe all values should be normalized to range [0; 1]? Does the range of training data affects ANN learing quality?
My Neural Network has 6 input neurons, 1 hidden layer and I am using sigmoid symmetric activation function (tanh).
For the neural network it doesn't matter whether the data is normalised.
However, the performance of the training method can vary a lot.
In a nutshell: typically the methods prefer variables which have larger values. This might send the training method off-track.
Crucial for most NN training methods is that all dimensions of the training data have the same domain. If all your variables are angles it doesn't matter, whether they are [0,1) or [0,2*pi) or [0,360) as long as they have the same domain. However, you should avoid having one variable for the angle [0,2*pi) and another variable for the distance in mm where distance can be much larger then 2000000mm.
Two cases where an algorithm might suffer in these cases:
(a) regularisation: if the weights of the NN are force to be small a tiny change of a weight controlling the input of a large domain variable has a much larger impact, than for a small domain
(b) gradient descent: if the step size is limited you have similar effects.
Recommendation: All variables should have the same domain size whether it is [0,1] or [0,2*pi] or ... doesn't matter.
Addition: for many domain "z-score normalisation" works extremely well.
The data points range affects the way you train a model. Suppose the range of values for features in the data set is not normalized. Then, depending on your data, you may end up having elongated Ellipses for the data points in the feature space and the learning model will have a very hard time learning the manifold on which the data points lie on (learn the underlying distribution). Also, in most cases the data points are sparsely spread in the feature space, if not normalized (see this). So, the take-home message is to normalize the features when possible.

Few questions about kohonen neural network

I have big data set (time-series, about 50 parameters/values). I want to use Kohonen network to group similar data rows. I've read some about Kohonen neural networks, i understand idea of Kohonen network, but:
I don't know how to implement Kohonen with so many dimensions. I found example on CodeProject, but only with 2 or 3 dimensional input vector. When i have 50 parameters - shall i create 50 weights in my neurons?
I don't know how to update weights of winning neuron (how to calculate new weights?).
My english is not perfect and I don't understand everything I read about Kohonen network, especially descriptions of variables in formulas, thats why im asking.
One should distinguish the dimensionality of the map, which is usually low (e.g. 2 in the common case of a rectangular grid) and the dimensionality of the reference vectors which can be arbitrarily high without problems.
Look at http://www.psychology.mcmaster.ca/4i03/demos/competitive-demo.html for a nice example with 49-dimensional input vectors (7x7 pixel images). The Kohonen map in this case has the form of a one-dimensional ring of 8 units.
See also http://www.demogng.de for a java simulator for various Kohonen-like networks including ring-shaped ones like the one at McMasters. The reference vectors, however, are all 2-dimensional, but only for easier display. They could have arbitrary high dimensions without any change in the algorithms.
Yes, you would need 50 neurons. However, these types of networks are usually low dimensional as described in this self-organizing map article. I have never seen them use more than a few inputs.
You have to use an update formula. From the same article: Wv(s + 1) = Wv(s) + Θ(u, v, s) α(s)(D(t) - Wv(s))
yes, you'll need 50 inputs for each neuron
you basically do a linear interpolation between the neurons and the target (input) neuron, and use W(s + 1) = W(s) + Θ() * α(s) * (Input(t) - W(s)) with Θ being your neighbourhood function.
and you should update all your neurons, not only the winner
which function you use as a neighbourhood function depends on your actual problem.
a common property of such a function is that it has a value 1 when i=k and falls off with the distance euclidian distance. additionally it shrinks with time (in order to localize clusters).
simple neighbourhood functions include linear interpolation (up to a "maximum distance") or a gaussian function