Neural network to solve AND - neural-network

I'm working on implementing a back propagation algorithm. Initially I worked on training my network to solve XOR to verify that it works correctly before using it for my design. After reading this I decided to train it to solve AND gate first. I'm using sigmoid as transfer function and MSE to calculate the total error. I used different learning rates that ranged between 0.01 and 0.5. I trained the network several times each time for different iterations ranging from 100 iterations to 1000 iterations. The minimum total error I got was 0.08. Is this acceptable error?
My second question, should I use threshold instead of sigmoid to solve AND gate? if yes what is the suitable threshold?
Thirdly, should I set a limit on the initial weights for example betwen -1 and 1??
Thanks in advance.
EDIT 1
I think the output is weird
here is the output after first iteration:
Target: 0.0 Output: 0.5314680723170211
Target: 0.0 Output: 0.7098671414869142
Target: 0.0 Output: 0.625565435381579
Target: 1.0 Output: 0.7827456263767251
and the output after the 400th iteration:
Target: 0.0 Output: 0.2826892072063843
Target: 0.0 Output: 0.4596476713717095
Target: 0.0 Output: 0.3675222634971935
Target: 1.0 Output: 0.5563197014845178
EDIT 2
Here is the part of my code that does the back propagation:
for( int i=0;i< currentLayer.getSize();i++)
{
temp = currentLayer.getAt(i);
err=temp.getOutput()*(1-temp.getOutput())*outErr[i];
temp.setError(roundTwoDecimals(err));
}
for ( int i=0;i<currentLayer.getSize();i++)
{
temp = currentLayer.getAt(i); // get a neuron at the output layer
// update the connections
for (int j=0 ;j<temp.getInConnections().size();j++)
{
inputCon= temp.getInputConnectionAt(j);
newW=inputCon.getWeight()+ inputCon.getDst().getError()*inputCon.getInput()*this.learningRate;
inputCon.setWeight(roundTwoDecimals(newW));
}
// now update the bias
temp.setBias(temp.getBias()+(this.learningRate*temp.getError()));
}

0.08 is pretty low, but AND should be perfectly solvable, meaning an error of 0 should be possible. Your iterations and learning rates seem reasonable too. What is the topology of your network? Are you including a bias node?
Standard backpropagation algorithms don't usually play nicely with thresholds, which is the reason they aren't usually used. If you want to try it as a debugging test, you could use the Perceptron training rule and a threshold of .5 (which is pretty standard).
Yes, constraining initial weights to be between -1 and 1 is probably a good idea. For simple logic tasks, people usually don't allow weights to go outside of that range at all, although in principle I don't think it should be a problem.

Related

System Dynamics simulation - Translating Stella into AnyLogic syntax

I modelled the following logic in stella:
(IF "cause" > 0 THEN MONTECARLO("probabilityofconsequence") ELSE 0
But Im not getting the correct syntax on AnyLogic:
(cause > 0) ? (uniform() < probabilityofconsequence) ? 1 : 0 : 0
Any ideas?
Disclaimer:
What stella does is with the Montecarlo function a series of zeros and ones from a Bernoulli distribution based on the probability provided. The probability is the percentage probability of an event happening per DT divided by DT (it is similar too, but not the same as, the percent probability of an event per unit time). The probability value can be either a variable or a constant, but should evaluate to a number between 0 and 100/DT (numbers outside the range will be set to 0 or 100/DT). The expected value of the stream of numbers generated summed over a unit time is equation to probability/100.
MONTECARLO is equivalent to the following logic:
IF (UNIFORM(0,100,<seed>) < probability*DT THEN 1 ELSE 0
the equivalent in anylogic should be:
cause>0 && uniform(0,100) < probability*DT ? 1 : 0
you need to create a variable called DT that is the equal to either the fixed time step that you have chosen in your model configuration, or the value you consider that should be adequate.
Since anylogic depending on how you are running the model, doesn't consider the fixed time step as fixed, you need to define the DT yourself.
No matter what, you are going to get results not exactly equal to stella probably since the time steps are not necessarily the same... but maybe similar enough should satisfy you

Should the output of backpropogation converge to 1 given the output is (0,1)

I am currently trying to understand the ANN that I created for an assignment that essentially takes gray scale (0-150)images (120x128) and determines whether the person is Male or Female. It works for the most part. I am treating this like a boolean problem where the output(Male = 1, Female = 0). I am able to get the ANN to correctly identify Male or Female. However the outputs I am getting for the Males are (0.3-0.6) depending on the run. Should I be getting the value ~1 out?
I am using a sigmoid unit 1/(1+e^-y) and have tried to take the inverse. I have tried this using 5 - 60 hidden units on 1 layer and tried 2 outputs with flip flop results. I want to understand this so that I can apply this to a non-boolean problem. ie If I want a numerical output how would I got about doing that or am I using the wrong machine learning technique?
You can use binary function at the output with some threshold. Assuming, you have assigned 0 for female and 1 for male in training, while testing you will get values in between 0 and 1 and also some times below 0 and above 1......So to make a decision at the output value just add threshold of 0.5 and check output value, if it is less than 0.5 then estimated class is female and if it is equal to or greater than 0.5 then estimated class is male.

MSE in neuralnet results and roc curve of the results

Hi my question is a bit long please bare and read it till the end.
I am working on a project with 30 participants. We have two type of data set (first data set has 30 rows and 160 columns , and second data set has the same 30 rows and 200 columns as outputs=y and these outputs are independent), what i want to do is to use the first data set and predict the second data set outputs.As first data set was rectangular type and had high dimension i have used factor analysis and now have 19 factors that cover up to 98% of the variance. Now i want to use these 19 factors for predicting the outputs of the second data set.
I am using neuralnet and backpropogation and everything goes well and my results are really close to outputs.
My questions :
1- as my inputs are the factors ( they are between -1 and 1 ) and my outputs scale are between 4 to 10000 and integer , should i still scaled them before running neural network ?
2-I scaled the data ( both input and outputs ) and then predicted with neuralnet , then i check the MSE error it was so high like 6000 while my prediction and real output are so close to each other. But if i rescale the prediction and outputs then check The MSE its near zero. Is it unbiased to rescale and then check the MSE ?
3- I read that it is better to not scale the output from the beginning but if i just scale the inputs all my prediction are 1. Is it correct to not to scale the outputs ?
4- If i want to plot the ROC curve how can i do it. Because my results are never equal to real outputs ?
Thank you for reading my question
[edit#1]: There is a publication on how to produce ROC curves using neural network results
http://www.lcc.uma.es/~jja/recidiva/048.pdf
1) You can scale your values (using minmax, for example). But only scale your training data set. Save the parameters used in the scaling process (in minmax they would be the min and max values by which the data is scaled). Only then, you can scale your test data set WITH the min and max values you got from the training data set. Remember, with the test data set you are trying to mimic the process of classifying unseen data. Unseen data is scaled with your scaling parameters from the testing data set.
2) When talking about errors, do mention which data set the error was computed on. You can compute an error function (in fact, there are different error functions, one of them, the mean squared error, or MSE) on the training data set, and one for your test data set.
4) Think about this: Let's say you train a network with the testing data set,and it only has 1 neuron in the output layer . Then, you present it with the test data set. Depending on which transfer function (activation function) you use in the output layer, you will get a value for each exemplar. Let's assume you use a sigmoid transfer function, where the max and min values are 1 and 0. That means the predictions will be limited to values between 1 and 0.
Let's also say that your target labels ("truth") only contains discrete values of 0 and 1 (indicating which class the exemplar belongs to).
targetLabels=[0 1 0 0 0 1 0 ];
NNprediction=[0.2 0.8 0.1 0.3 0.4 0.7 0.2];
How do you interpret this?
You can apply a hard-limiting function such that the NNprediction vector only contains the discreet values 0 and 1. Let's say you use a threshold of 0.5:
NNprediction_thresh_0.5 = [0 1 0 0 0 1 0];
vs.
targetLabels =[0 1 0 0 0 1 0];
With this information you can compute your False Positives, FN, TP, and TN (and a bunch of additional derived metrics such as True Positive Rate = TP/(TP+FN) ).
If you had a ROC curve showing the False Negative Rate vs. True Positive Rate, this would be a single point in the plot. However, if you vary the threshold in the hard-limit function, you can get all the values you need for a complete curve.
Makes sense? See the dependencies of one process on the others?

Neural Network for a Robot

I need to implement a Robot Brain, I used feedforward neural network as a Controller. The robot has 24 sonar sonsor, and only one ouput which is R=Right, L=Left, F=Forward, B=Back. I also have a large dataset which contain sonar data and the desired output. The FNN is trained using backpropagation algorithm.
I used neuroph Studio to construct the FNN and to do the trainnig. Here the network params:
Input layer: 24
Hidden Layer: 10
Output Layer: 1
LearnningRate: 0.5
Momentum: 0.7
GlobalError: 0.1
My problem is that during iteration the error drop slightly and seems to be static. I tried to change the parameter but I'm not getting any useful result!!
Thanks for your help
Use 1 of n encoding for the output. Use 4 output neurons, and set up your target (output) data like this:
1 0 0 0 = right
0 1 0 0 = left
0 0 1 0 = forward
0 0 0 1 = back
Reduce the number of input sensors (and corresponding input neurons) to begin with, down to 3 or 5. This will simplify things so you can understand what's going on. Later you can build back up to 24 inputs.
Neural networks often get stuck in local minima during training, that could be why your error is static. Increasing the momentum can help avoid this.
Your learning rate looks quite high. Try 0.1, but play around with these values. Every problem is different and there are no values guaranteed to work.

My neural network forgets the last training when I try to teach next set of training inputs

Im learning(started today) neural networks and could finish a 2x2x1 network(forward data feeding and backward error propagated) that can learn AND operation for one set of inputs. It also dodges any local minimums using randomized parameters. My first source for this is: http://www.codeproject.com/Articles/14342/Designing-And-Implementing-A-Neural-Network-Librar
The problem is: it learns 0 AND 0 using inputs (0,0) but when I give (0,1) it forgets 0 AND 0 then learns 0 AND 1. Is this a general newbie bug?
What I tried:
loop for 10000 times
learn 0 and 0
end loop
loop for 10000 times
learn 0 and 1 (forgets 0 and 0)
end loop
loop for 10000 times
learn 1 and 0 (forgets 0 and 1)
end loop
loop for 10000 times
learn 1 and 1 (forgets 1 and 0)
end loop
only one set is learned
fail
Trial 2:
loop for 10000 times
learn 0 and 0
learn 0 and 1
learn 1 and 0
learn 1 and 1
end loop
gives same result for all input combinations.
fail.
Activation function for each neuron: hyperbolic tangent
2x2 structure: all-pairs
2x1 structure: all-pairs
Randomized learning rate: yes, small enough to keep far from explosive iteration (per iteration)
Randomized bias per neuron: yes, between -0.5 and +0.5 (just at start)
Randomized weighting: yes, between -0.5 and +0.5 (just at start)
Edit: Bias and weight updates are done for all-pairs of hidden and output layers.
Edit: All neurons(hidden+output) use same activation function.
Without specific code it is hard to say for sure, but I think the issue is that you are only giving it one case to learn at a time. You should give it a matrix of your different learning examples, with an expected result vector. Then, when you update your weights and biases, you are finding the values that minimize the error between your network output for all cases, and the expected output for all cases.
For an AND gate, your input would be (in MATLAB code, not sure what language you are using but that syntax is easy to understand):
input = [0, 0;
0, 1;
1, 0;
1, 1];
And your expected output would be:
output = [0;
0;
0;
1];
I think what you are doing now is basically finding the weights and biases that minimize the error between the network output and the expected output for just one input case, and then re-training those weights and biases to minimize the error for the second case, then the third, then the fourth. If you put them in arrays like this it should minimize the overall error for all cases. This is just my best guess though without any code to go on.