My neural network forgets the last training when I try to teach next set of training inputs

My neural network forgets the last training when I try to teach next set of training inputs - neural-network

Im learning(started today) neural networks and could finish a 2x2x1 network(forward data feeding and backward error propagated) that can learn AND operation for one set of inputs. It also dodges any local minimums using randomized parameters. My first source for this is: http://www.codeproject.com/Articles/14342/Designing-And-Implementing-A-Neural-Network-Librar
The problem is: it learns 0 AND 0 using inputs (0,0) but when I give (0,1) it forgets 0 AND 0 then learns 0 AND 1. Is this a general newbie bug?
What I tried:
loop for 10000 times
learn 0 and 0
end loop
loop for 10000 times
learn 0 and 1 (forgets 0 and 0)
end loop
loop for 10000 times
learn 1 and 0 (forgets 0 and 1)
end loop
loop for 10000 times
learn 1 and 1 (forgets 1 and 0)
end loop
only one set is learned
fail
Trial 2:
loop for 10000 times
learn 0 and 0
learn 0 and 1
learn 1 and 0
learn 1 and 1
end loop
gives same result for all input combinations.
fail.
Activation function for each neuron: hyperbolic tangent
2x2 structure: all-pairs
2x1 structure: all-pairs
Randomized learning rate: yes, small enough to keep far from explosive iteration (per iteration)
Randomized bias per neuron: yes, between -0.5 and +0.5 (just at start)
Randomized weighting: yes, between -0.5 and +0.5 (just at start)
Edit: Bias and weight updates are done for all-pairs of hidden and output layers.
Edit: All neurons(hidden+output) use same activation function.

Without specific code it is hard to say for sure, but I think the issue is that you are only giving it one case to learn at a time. You should give it a matrix of your different learning examples, with an expected result vector. Then, when you update your weights and biases, you are finding the values that minimize the error between your network output for all cases, and the expected output for all cases.
For an AND gate, your input would be (in MATLAB code, not sure what language you are using but that syntax is easy to understand):
input = [0, 0;
0, 1;
1, 0;
1, 1];
And your expected output would be:
output = [0;
0;
0;
1];
I think what you are doing now is basically finding the weights and biases that minimize the error between the network output and the expected output for just one input case, and then re-training those weights and biases to minimize the error for the second case, then the third, then the fourth. If you put them in arrays like this it should minimize the overall error for all cases. This is just my best guess though without any code to go on.

Related

Not getting the correct expectation in MATLAB

I am trying to make the random number generator work, however my expectation is not correct (unless I am fundamentally misunderstanding something).
I have this preset array :
array=[1 0 0]
I proceed to generate a random index between 1 and 3 using the rand() function in MATLAB.
index=round(1+2*rand(1,1),0); %generate random index between 1 and 3
Subsequently, I am trying to put together an array of entries accessed by the randomly generated index for some number of iterations:
answer_array=[] %answer holder
for i = 1:1:iterations %Loop for the set number of iterations
index=round(1+2*rand(1,1),0);
answer_array=[answer_array array(index)]
end
Then I try to sum the number of 1's in the answer_array and find the ratio of that to the number of iterations
temp=0;
for i=1:size(answer_array,2)
temp=temp+answer_array(1,i) %just sum all array contents as they're either 1 or 0
end
ratio=temp/iterations
No matter how large I set my iterations, the ratio converges at 0.24.
Should it not be 0.33? My reasoning is that if I randomly pick the index myself, I have 1/3 chance of getting 1. Thus if I continue picking for a large number of times, my success chance will converge at 1/3?

MSE in neuralnet results and roc curve of the results

Hi my question is a bit long please bare and read it till the end.
I am working on a project with 30 participants. We have two type of data set (first data set has 30 rows and 160 columns , and second data set has the same 30 rows and 200 columns as outputs=y and these outputs are independent), what i want to do is to use the first data set and predict the second data set outputs.As first data set was rectangular type and had high dimension i have used factor analysis and now have 19 factors that cover up to 98% of the variance. Now i want to use these 19 factors for predicting the outputs of the second data set.
I am using neuralnet and backpropogation and everything goes well and my results are really close to outputs.
My questions :
1- as my inputs are the factors ( they are between -1 and 1 ) and my outputs scale are between 4 to 10000 and integer , should i still scaled them before running neural network ?
2-I scaled the data ( both input and outputs ) and then predicted with neuralnet , then i check the MSE error it was so high like 6000 while my prediction and real output are so close to each other. But if i rescale the prediction and outputs then check The MSE its near zero. Is it unbiased to rescale and then check the MSE ?
3- I read that it is better to not scale the output from the beginning but if i just scale the inputs all my prediction are 1. Is it correct to not to scale the outputs ?
4- If i want to plot the ROC curve how can i do it. Because my results are never equal to real outputs ?
Thank you for reading my question

[edit#1]: There is a publication on how to produce ROC curves using neural network results
http://www.lcc.uma.es/~jja/recidiva/048.pdf
1) You can scale your values (using minmax, for example). But only scale your training data set. Save the parameters used in the scaling process (in minmax they would be the min and max values by which the data is scaled). Only then, you can scale your test data set WITH the min and max values you got from the training data set. Remember, with the test data set you are trying to mimic the process of classifying unseen data. Unseen data is scaled with your scaling parameters from the testing data set.
2) When talking about errors, do mention which data set the error was computed on. You can compute an error function (in fact, there are different error functions, one of them, the mean squared error, or MSE) on the training data set, and one for your test data set.
4) Think about this: Let's say you train a network with the testing data set,and it only has 1 neuron in the output layer . Then, you present it with the test data set. Depending on which transfer function (activation function) you use in the output layer, you will get a value for each exemplar. Let's assume you use a sigmoid transfer function, where the max and min values are 1 and 0. That means the predictions will be limited to values between 1 and 0.
Let's also say that your target labels ("truth") only contains discrete values of 0 and 1 (indicating which class the exemplar belongs to).
targetLabels=[0 1 0 0 0 1 0 ];
NNprediction=[0.2 0.8 0.1 0.3 0.4 0.7 0.2];
How do you interpret this?
You can apply a hard-limiting function such that the NNprediction vector only contains the discreet values 0 and 1. Let's say you use a threshold of 0.5:
NNprediction_thresh_0.5 = [0 1 0 0 0 1 0];
vs.
targetLabels =[0 1 0 0 0 1 0];
With this information you can compute your False Positives, FN, TP, and TN (and a bunch of additional derived metrics such as True Positive Rate = TP/(TP+FN) ).
If you had a ROC curve showing the False Negative Rate vs. True Positive Rate, this would be a single point in the plot. However, if you vary the threshold in the hard-limit function, you can get all the values you need for a complete curve.
Makes sense? See the dependencies of one process on the others?

How to properly model ANN to find relationship between real value input-output data?

I'm trying to make an ANN which could tell me if there is causality between my input and output data. Data is following:
My input are measured values of pesticides (19 total) in an area eg:
-1.031413662 -0.156086316 -1.079232918 -0.659174849 -0.734577317 -0.944137546 -0.596917991 -0.282641072 -0.023508282 3.405638835 -1.008434997 -0.102330305 -0.65961995 -0.687140701 -0.167400684 -0.4387984 -0.855708613 -0.775964435 1.283238514
And the output is the measured value of plant-somthing in the same area (55 total) eg:
0.00 0.00 0.00 13.56 0 13.56 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 13.56 0 0 0 1.69 0 0 0 0 0 0 0 0 0 0 1.69 0 0 0 0 13.56 0 0 0 0 13.56 0 0 0 0 0 0
Values for input are in range from -2.5 to 10, and for output from 0 to 100.
So the question I'm trying to answer is: in what measure does pesticide A affect plant-somthings.
What are good ways to model (represent) input/output neurons to be able to process the mentioned input/output data? And how to scale/convert input/output data to be useful for NN?
Is there a book/paper that I should look at?

First, a neural network cannot find the causality between output and input, but only the correlation (just like every other probabilistic methods). Causality can only be derived logically from reasoning (and even then, it's not always clear, it all depends on your axioms).
Secondly, about how to design a neural network to model your data, here is a pretty simple rule that can be generally applied to make a first working draft:
set the number of input neurons = the number of input variables for one sample
set the number of output neurons = the number of output variables for one sample
then play with the number of hidden layers and the number of hidden neurons per hidden layer. In practice, you want to use the fewest number of hidden layers/neurons to model your data correctly, but enough so that the function approximated by your neural network fits correctly the data (else the error in output will be huge compared to the real output dataset).
Why do you need to use just enough neurons but not too much? This is because if you use a lot of hidden neurons, you are sure to overfit your data, and thus you will make a perfect prediction on your training dataset, but not in the general case when you will use real datasets. Theoretically, this is because a neural network is a function approximator, thus it can approximate any function, but using a too high order function will lead to overfitting. See PAC learning for more info on this.
So, in your precise case, the first thing to do is to clarify how many variables you have in input and in output for each sample. If it's 19 in input, then create 19 input nodes, and if you have 55 output variables, then create 55 output neurons.
About scaling and pre-processing, yes you should normalize your data between the range 0 and 1 (or -1 and 1 it's up to you and it depends on the activation function). A very good place to start is to watch the videos at the machine learning course by Andrew Ng at Coursera, this should get you kickstarted quickly and correctly (you'll be taught the tools to check that your neural network is working correctly, and this is immensely important and useful).
Note: you should check your output variables, from the sample you gave it seems they use discrete values: if the values are discrete, then you can use discrete output variables which will be a lot more precise and predictive than using real, floating values (eg, instead of having [0, 1.69, 13.56] as the possible output values, you'll have [0,1,2], this is called "binning" or multi-class categorization). In practice, this means you have to change the way your network works, by using a classification neural network (using activation functions such as sigmoid) instead of a regressive neural network (using activation functions such as logistic regression or rectified linear unit).

Neural Network for a Robot

I need to implement a Robot Brain, I used feedforward neural network as a Controller. The robot has 24 sonar sonsor, and only one ouput which is R=Right, L=Left, F=Forward, B=Back. I also have a large dataset which contain sonar data and the desired output. The FNN is trained using backpropagation algorithm.
I used neuroph Studio to construct the FNN and to do the trainnig. Here the network params:
Input layer: 24
Hidden Layer: 10
Output Layer: 1
LearnningRate: 0.5
Momentum: 0.7
GlobalError: 0.1
My problem is that during iteration the error drop slightly and seems to be static. I tried to change the parameter but I'm not getting any useful result!!
Thanks for your help

Use 1 of n encoding for the output. Use 4 output neurons, and set up your target (output) data like this:
1 0 0 0 = right
0 1 0 0 = left
0 0 1 0 = forward
0 0 0 1 = back
Reduce the number of input sensors (and corresponding input neurons) to begin with, down to 3 or 5. This will simplify things so you can understand what's going on. Later you can build back up to 24 inputs.
Neural networks often get stuck in local minima during training, that could be why your error is static. Increasing the momentum can help avoid this.
Your learning rate looks quite high. Try 0.1, but play around with these values. Every problem is different and there are no values guaranteed to work.

Performace Issue on MATLAB's vector addressing

I'm wondering, what is faster for addressing a single Element of a vector:
1) direct access via
result = a(index)
or
2) access an element via a matrix multiplication e.g
a = [1 2 3 4]';
b = [0 0 1 0];
result = b*a; % Would return 3
In my oppinion (which comes from "classic" programming like C++) the first method must be more performant, because of the direct access...the second method would need a iteration through both vectors(?).
The reason why I'm asking is, that matlab is very performant on matrix and vector operations, maybe I am missing any aspect and the second method is more effective...

A quick test:
function [] = fun1()
a = [1 2 3 4]';
b = [0 0 1 0];
tic;
for i=1:1000000
r = a(3);
end
toc;
end
Elapsed time: 0.006 seconds
Change a(3) to b*a
Elapsed time: 0.9 seconds
The performance difference is quite obvious(, and you should have done that yourself before asking this question).
Reason behind that:
No matter how efficient MATLAB's calculation is, MATLAB still needs to fetch the number 1 by 1, and do multiplication 1 by 1, and sum up. There is no hope to be faster than a single access.
In your special case, there are all 0's except 1, but it is useless to do optimization for single special case in my opinion, and the best optimization I can come up with still needs to access all the elements for at least once each.
EDIT:
It seems I am in quite good mood today....
Change a(3) to a(1)*b(1)+a(2)*b(2)+a(3)*b(3)+a(4)*b(4)
Elapsed time: 0.02 seconds
It seems that boundary checking (and/or other errands) take more time than the access and calculation.

Why would you think that multiplying a lot of numbers by zeros would be at all efficient? Even if MATLAB could be smart enough to do a test first before the multiply, it must then still do many tests.
I'm asking this question to make a point, that the dot product cannot possibly be at all efficient. Even if MATLAB were smart enough to know that there was only one element that was non-zero, to know that, it would need to do a search for the non-zero element. And how would MATLAB be smart enough to know that what you have written as a vector*vector dot product is actually intended just to access a single element, instead of a true dot product for nefarious purposes unknown to it?

How about
3) access an element by a boolean index matrix:
a = [1 2 3 4]';
b = [0 0 1 0];
result = a(b)
It's almost certainly going to be faster than (2), slower than (1).

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse