How to train this neural network? - neural-network

I programmed a simple back propagation NN. Here is the code snippet:
for (int i = 0; i < 10000; i++)
{
/// i1 = Convert.ToDouble(textBox1.Text);
//i2 = Convert.ToDouble(textBox2.Text);
//desired = Convert.ToDouble(textBox3.Text);
Random rnd = new Random();
i1 = rnd.Next(0, 1);
Random rnd1 = new Random();
i2 = rnd1.Next(0, 1);
if(i1 == 1 && i2 == 1)
{
desired = 0;
}
else if(i1 == 0&&i2 == 0)
{
desired = 0;
}
else
{
desired = 1;
}
//hidden layer hidden values
h1 = i1 * w1 + i2 * w2; //i1*w1+i2*w2
h2 = i1 * w3 + i2 * w4;//i1*w3+i2*w4
h3 = i1 * w5 + i2 * w6;//i1*w5+i2*w6;
//hidden layer hidden values
//VALUE OF HIDDEN LAYER
h1v = Sigmoid(h1);
h2v = Sigmoid(h2);
h3v = Sigmoid(h3);
//VALUE OF HIDDEN LAYER
//output final
output = h1v * w7 + h2v * w8 + h3v * w9;
outputS = Sigmoid(output);
//output final
//BACKPROPAGATION
//MARGIN ERROR
Error = desired - outputS; //desired-cena jaka ma byc OutputS-zgadnienta cena
//Margin Error
//DElta output sum
deltaoutputsum = Derivative(output) * Error; //output bez sigmoida i error
//Delta output sum
//weight of w7,w8,w9.
w7b = w7; //0.3
w8b = w8; // 0.5
w9b = w9;// 0.9
w7 = w7 + deltaoutputsum * h1v; //waga w7
w8 = w8 + deltaoutputsum * h2v; //waga w8
w9 = w9 + deltaoutputsum * h3v; //waga w9
//weights of w7,w8,w9.
//DELTA HIDDEN SUm
h1 = deltaoutputsum * w7b * Derivative(h1);
h2 = deltaoutputsum * w8b * Derivative(h2);
h3 = deltaoutputsum * w9b * Derivative(h3);
//DELTA HIDDEN SUM
//weights 1,2,3,4,5,6
w1 = w1 - h1 * i1;
w2 = w2 - h1 * i2;
w3 = w3 - h2 * i1;
w4 = w4 - h2 * i2;
w5 = w5 - h3 * i1;
w6 = w6 - h3 * i2;
label1.Text = outputS.ToString();
label2.Text = w1.ToString();
label3.Text = w2.ToString();
label4.Text = w3.ToString();
label5.Text = w4.ToString();
label6.Text = w5.ToString();
label7.Text = w6.ToString();
label8.Text = w7.ToString();
label9.Text = w8.ToString();
label10.Text = w9.ToString();
//weights 1,2,3,4,5,6
}
It is very simple to solve XOR problems. But I'dont now how to predict the output. Here i must provide answear to set the weights, but how to predict?
It train 10,000 on random training data.
Now when it is trained how to predict the answear?
Please help.
Sorry for my english but I dont now it very well.
h1-3 are weights of nodes
h1v are values of nodes
w1-10 are weights

I believe your problem lies in how you are training.
Do the following and I believe your program will be correct
Try training each of the data sets one after another instead of random, random works for continuous floating point values, but when you are working with XOR, you might run into issues where training too much on one or two sets of values (because of the nature of random) will cause issues moving the wieghts back toward a value that works with other input XOR values. So train on [1,1], then immediately [1,0] then [0,1] and then [0, 0] and repeat over and over.
Make sure the derivative function is correct; the derivative of a sigmoid should be sigmoid(x) - sigmoid(x)^2
name your hidden sum values something different than h1, h2 etc.. if you use that for the hidden node input values.
If you do those things, it appears you should have something exactly mathematically equivalent to what "how to build a neural-network" has.
I would also recommend having values that aren't persistent initialized inside your loop instead of outside. I may be wrong, but I don't think any value except your w1 w2 w3 etc... values need to be persistent through every training iteration. Not doing this causes hard to catch bugs and make reading the code harder since you can't guarantee variables aren't being modified elsewhere.

Related

My two layer neural network model doesn't converge

I am training a two layer neural network. I waited for 15000 epochs, still model doesn't converge.
ans = []
for i in range(1000):
x1,y1 = random.uniform(-3,3),random.uniform(-3,3)
if x1*x1 + y1 * y1 < 1:
ans.append([x1,y1,0])
elif x1*x1 + y1 * y1 >= 2 and x1*x1 + y1 * y1 <=8:
ans.append([x1,y1,1])
data = pd.DataFrame(ans)
print(data.shape)
X = np.array(data[[0,1]])
y = np.array(data[2])
I am generating random points generating data. the data looks like something like this.
weights_layer1 = np.random.normal(scale=1 / 10**.5, size=(2,20))
bias1 = np.zeros((1,20))
bias2 = np.zeros((1,1))
weights_layer2 = np.random.normal(scale=1 / 10**.5, size=(20,1))
for e in range(15000):
for x,y1 in zip(X,y):
x = x.reshape(1,2)
layer1 = sigmoid(np.dot(x,weights_layer1)+bias1)
layer2 = sigmoid(np.dot(layer1,weights_layer2)+bias2)
dk = (y1-layer2)*layer2*(1-layer2)
dw2 = learnrate * dk * layer1.T
dw2 =dw2.reshape(weights_layer2.shape)
# print(dw2.shape)
weights_layer2 += dw2
# bias2 += dk * learnrate
dj = weights_layer2.T* layer1*(1-layer1)*dk
dw1 = learnrate * np.dot(x.T,dj)
I am calculating loss in this manner.
loss = 0
for x,y1 in zip(X,y):
layer1 = sigmoid(np.dot(x,weights_layer1))
layer2 = sigmoid(np.dot(layer1,weights_layer2))
loss += (layer2 - y1)**2
print(loss)
cant find what is going wrong,can you see anything? Thanks. I trained the same with pytorch it is converging fine.
the final model looks like this on trained data. but on test data it is worse.
After few hours of trying out, I found the problem. This network doesn't converge without biases. Used biases it converged in 5000 epochs.

matrix singular under determined linear system not solvable

Following this question, I modified my code to:
model test
// types
type Mass = Real(unit = "Kg", min = 0);
type Length = Real(unit = "m");
type Area = Real(unit = "m2", min = 0);
type Force = Real(unit = "Kg.m/s2");
type Pressure = Real(unit = "Kg/m/s2");
type Torque = Real(unit = "Kg.m2/s2");
type Velocity = Real(unit = "m/s");
type Time = Real(unit = "s");
// constants
constant Real pi = 2 * Modelica.Math.asin(1.0);
parameter Mass Mp = 0.01;
parameter Length r1 = 0.010;
parameter Length r3 = 0.004;
parameter Integer n = 3;
parameter Area A = 0.020 * 0.015;
parameter Time Stepping = 1.0;
parameter Real DutyCycle = 1.0;
parameter Pressure Pin = 500000;
parameter Real Js = 1;
//parameter Real Muk = 0.0;
parameter Real Muk = 0.158;
// variables
Length x[n];
Velocity vx[n];
Real theta;
Real vt;
Pressure P[n];
Force Fnsp[n];
Torque Tfsc;
initial equation
theta = 0;
vt = 0;
algorithm
for i in 1:n loop
if noEvent((i - 1) * Stepping < mod(time, n * Stepping)) and noEvent(mod(time, n * Stepping) < Stepping * ((i - 1) + DutyCycle)) then
P[i] := Pin;
else
P[i] := 0;
end if;
end for;
Tfsc := -r3 * Muk * sign(vt) * abs(sum(Fnsp));
equation
vx = der(x);
vt = der(theta);
x = r1 * {sin(theta + (i - 1) * 2 * pi / n) for i in 1:n};
Mp * der(vx) + P * A = Fnsp;
Js * der(theta) = Tfsc - r1 * Fnsp * {cos(theta + (i - 1) * 2 * pi / n) for i in 1:n};
// Js * der(theta) = - r1 * Fnsp * {cos(theta + (i - 1) * 2 * pi / n) for i in 1:n};
annotation(
experiment(StartTime = 0, StopTime = 30, Tolerance = 1e-06, Interval = 0.03),
__OpenModelica_simulationFlags(lv = "LOG_STATS", outputFormat = "mat", s = "dassl"));
end test;
However, I get the preprocessing warning of
[1] .... Translation Warning
Iteration variables with default zero start attribute in torn nonlinear equation system:
Fnsp[3]:VARIABLE(unit = "Kg.m/s2" ) type: Real [3]
Fnsp[2]:VARIABLE(unit = "Kg.m/s2" ) type: Real [3]
Fnsp[1]:VARIABLE(unit = "Kg.m/s2" ) type: Real [3]
$DER.vt:VARIABLE() type: Real
which doesn't make sense but I assume I can safely ignore, and the compiling error of:
Matrix singular!
under-determined linear system not solvable
which had also been previously reported here. if I remove the lines
Torque Tfsc;
and
Tfsc := -r3 * Muk * sign(vt) * abs(sum(Fnsp));
and changing
Js * der(theta) = - r1 * Fnsp * {cos(theta + (i - 1) * 2 * pi / n) for i in 1:n};
works perfectly fine. However, setting Muk to zero, which theoretically the same thing leads to the same error as above! I would appreciate if you could help me know what is the problem and how I can resolve it.
P.S.1. On the demo version of Dymola the simulation test finishes with no errors, only the warning:
Some variables are iteration variables of the initialization problem:
but they are not given any explicit start values. Zero will be used.
Iteration variables:
der(theta, 2)
P[1]
P[2]
P[3]
P.S.2. Using JModelica, removing the noEvent and using the python code:
model_name = 'test'
mo_file = 'test.mo'
from pymodelica import compile_fmu
from pyfmi import load_fmu
my_fmu = compile_fmu(model_name, mo_file)
myModel = load_fmu('test.fmu')
res = myModel.simulate(final_time=30)
theta = res['theta']
t = res['time']
import matplotlib.pyplot as plt
plt.plot(t, theta)
plt.show()
it solves the model blazingly fast for small values (e.g. 0.1) of Muk. But again it gets stuck for bigger values. The only warnings are:
Warning at line 30, column 3, in file 'test.mo':
Iteration variable "Fnsp[2]" is missing start value!
Warning at line 30, column 3, in file 'test.mo':
Iteration variable "Fnsp[3]" is missing start value!
Warning in flattened model:
Iteration variable "der(_der_theta)" is missing start value!
You do not need to use algorithm for the assignments of equations (even if they are in a for-loop and an if). I moved them to the equation section and removed your algorithm section completely:
model test
// types
type Mass = Real(unit = "Kg", min = 0);
type Length = Real(unit = "m");
type Area = Real(unit = "m2", min = 0);
type Force = Real(unit = "Kg.m/s2");
type Pressure = Real(unit = "Kg/m/s2");
type Torque = Real(unit = "Kg.m2/s2");
type Velocity = Real(unit = "m/s");
type Time = Real(unit = "s");
// constants
constant Real pi = 2 * Modelica.Math.asin(1.0);
parameter Mass Mp = 0.01;
parameter Length r1 = 0.010;
parameter Length r3 = 0.004;
parameter Integer n = 3;
parameter Area A = 0.020 * 0.015;
parameter Time Stepping = 1.0;
parameter Real DutyCycle = 1.0;
parameter Pressure Pin = 500000;
parameter Real Js = 1;
//parameter Real Muk = 0.0;
parameter Real Muk = 0.158;
// variables
Length x[n];
Velocity vx[n];
Real theta;
Real vt;
Pressure P[n];
Force Fnsp[n];
Torque Tfsc;
initial equation
theta = 0;
vt = 0;
equation
for i in 1:n loop
if noEvent((i - 1) * Stepping < mod(time, n * Stepping)) and noEvent(mod(time, n * Stepping) < Stepping * ((i - 1) + DutyCycle)) then
P[i] = Pin;
else
P[i] = 0;
end if;
end for;
Tfsc = -r3 * Muk * sign(vt) * abs(sum(Fnsp));
vx = der(x);
vt = der(theta);
x = r1 * {sin(theta + (i - 1) * 2 * pi / n) for i in 1:n};
Mp * der(vx) + P * A = Fnsp;
Js * der(theta) = Tfsc - r1 * Fnsp * {cos(theta + (i - 1) * 2 * pi / n) for i in 1:n};
// Js * der(theta) = - r1 * Fnsp * {cos(theta + (i - 1) * 2 * pi / n) for i in 1:n};
end test;
This makes it far easier for the compiler to find a sensible sorting and tearing for the strong components. This still breaks at 19s but before that it may just be what you are looking for. The newton solver diverges after that threshold, since i don't really know what you are doing here i unfortunately cannot provide any analysis of the results.
Also it seems like your event triggered by your if-equation could be cleanly replaced by a Sample operator. You might want to have a look at that.

Failed to solve linear system of equations

I'm trying to solve the code
model modelTest
// types
type Mass = Real (unit = "Kg", min = 0);
type Length = Real (unit = "m");
type Area = Real (unit = "m2", min = 0);
type Force = Real (unit = "Kg.m/s2");
type Pressure = Real (unit = "Kg/m/s2");
type Torque = Real (unit = "Kg.m2/s2");
type Velocity = Real (unit = "m/s");
type Time = Real (unit = "s");
// constants
constant Real pi = 2 * Modelica.Math.asin(1.0);
parameter Mass Mp = 0.01;
parameter Length r1 = 0.010;
parameter Integer n = 3;
parameter Area A = 0.020 * 0.015;
parameter Time Stepping = 0.1;
parameter Real DutyCycle = 0.5;
parameter Pressure Pin = 5000;
parameter Real Js = 1;
// variables
Length x[n];
Velocity vx[n];
Real theta;
Real vt;
Pressure P[n];
initial equation
theta = 0;
vt = 0;
algorithm
for i in 1:n loop
if noEvent((i - 1) * Stepping < mod(time, Stepping)) and noEvent(mod(time, Stepping) < (i - 1) * Stepping + Stepping * DutyCycle) then
P[i] := Pin;
else
P[i] := 0;
end if;
end for;
equation
vx = der(x);
vt = der(theta);
x = r1 * {sin(theta + (i -1) * 2 * pi / n) for i in 1:n};
Js * der(theta) = r1 * sum((Mp * der(vx) + P * A) .* {cos(theta + (i -1) * 2 * pi / n) for i in 1:n});
annotation(
experiment(StartTime = 0, StopTime = 10, Tolerance = 1e-6, Interval = 0.01),
__OpenModelica_simulationFlags(lv = "LOG_STATS", outputFormat = "mat", s = "dassl"));
end modelTest;
but the solver never finishes showing the error:
Failed to solve linear system of equations (no. 51) at time ... Residual norm is ...
The default linear solver fails, the fallback solver with total pivoting at time ... that might riase plv LOG_LS.
I would appreciate if you could help me know what is the problem and how I can solve it. Thanks for your support in advance.
P.S.1. I found this similar issue from 15 months ago.
P.S.2. There were several mistakes in the code. A modified version can be found here.

Sigmoid and it's dearativate

public double Sigmoid(double x)
{
return 2 / (1 + Math.Exp(-2 * x)) - 1;
}
public double Derivative(double x)
{
double s = Sigmoid(x) - (Sigmoid(x)* Sigmoid(x));
return s;
}
When i train the network it is giving output:
0,0 = 0 it is always 0 //I dont know
0,1 = 0,67 and it is going up //good but after 1000 repets it gets to
0.20 and it is goind down
1,0 = 0.50 and it is going up //good but after 1000 repets it gets to
0.20 and it is goind down
1,1 = 0.80 and it is going up //wrong it should go down.
Where is the mistake?
Neural network (XOR and back propagation)
int pw = Convert.ToInt32(textBox1.Text);
for (int i12 = 0; i12 < pw; i12++)
{
//i1 = Convert.ToDouble(textBox2.Text);
// i2 = Convert.ToDouble(textBox3.Text);
// desired = Convert.ToDouble(textBox1.Text);
for (int i = 0; i < 4; i++)
{
if (i == 0)
{
i1 = 1;
i2 = 1;
desired = 0;
}
else if (i == 1)
{
i1 = 1;
i2 = 0;
desired = 1;
}
else if (i == 2)
{
i1 = 0;
i2 = 1;
desired = 1;
}
else if (i == 3)
{
i1 = 0;
i2 = 0;
desired = 0;
}
// double[] questions = new double[2];
// questions[0] = 1;
// questions[1] = 0;
// Random rnd = new Random();
// double s = questions[rnd.Next(0, 2)];
// double s1 = questions[rnd.Next(0, 2)];
// i1 = s;
// i2 = s1;
//hidden layer hidden values
h1 = i1 * w1 + i2 * w2; //i1*w1+i2*w2
h2 = i1 * w3 + i2 * w4;//i1*w3+i2*w4
h3 = i1 * w5 + i2 * w6;//i1*w5+i2*w6;
//hidden layer hidden values
//VALUE OF HIDDEN LAYER
h1v = Sigmoid(h1);
h2v = Sigmoid(h2);
h3v = Sigmoid(h3);
//VALUE OF HIDDEN LAYER
//output final
output = h1v * w7 + h2v * w8 + h3v * w9;
outputS = Sigmoid(output);
//output final
//BACKPROPAGATION
//MARGIN ERROR
Error = desired - outputS; //desired-cena jaka ma byc OutputS-zgadnienta cena
//Margin Error
//DElta output sum
deltaoutputsum = Derivative(output) * Error * 0.05; //output bez sigmoida i error
//Delta output sum
//weight of w7,w8,w9.
w7b = w7; //0.3
w8b = w8; // 0.5
w9b = w9;// 0.9
w7 = w7 + deltaoutputsum * h1v; //waga w7
w8 = w8 + deltaoutputsum * h2v; //waga w8
w9 = w9 + deltaoutputsum * h3v; //waga w9
//weights of w7,w8,w9.
//DELTA HIDDEN SUm
h1 = deltaoutputsum * w7b * Derivative(h1);
h2 = deltaoutputsum * w8b * Derivative(h2);
h3 = deltaoutputsum * w9b * Derivative(h3);
//DELTA HIDDEN SUM
//weights 1,2,3,4,5,6
w1 = w1 - h1 * i1;
w2 = w2 - h1 * i2;
w3 = w3 - h2 * i1;
w4 = w4 - h2 * i2;
w5 = w5 - h3 * i1;
w6 = w6 - h3 * i2;
Why after training it give:
1.0 == close to 0, should be close to 1
1.1 == close to 1,should be 0
0.0 == it is good, close to 0
0.1 == close to 0,should be close to 1
This is the code to use after training(i1 and i1 are inputs 1 or 0 )
i1 = Convert.ToDouble(textBox4.Text);
i2 = Convert.ToDouble(textBox5.Text);
//hidden layer hidden values
h1 = i1 * w1 + i2 * w2; //i1*w1+i2*w2
h2 = i1 * w3 + i2 * w4;//i1*w3+i2*w4
h3 = i1 * w5 + i2 * w6;//i1*w5+i2*w6;
//hidden layer hidden values
//VALUE OF HIDDEN LAYER
h1v = Sigmoid(h1);
h2v = Sigmoid(h2);
h3v = Sigmoid(h3);
//VALUE OF HIDDEN LAYER
//output final
output = h1v * w7 + h2v * w8 + h3v * w9;
outputS = Sigmoid(output);
MessageBox.Show(outputS.ToString());
w1-w10 are weights. h1v are valuse of hidden layers. h1 are weights of hidden layers

Octave backpropagation implementation issues

I wrote a code to implement steepest descent backpropagation with which I am having issues. I am using the Machine CPU dataset and have scaled the inputs and outputs into range [0 1]
The codes in matlab/octave is as follows:
steepest descent backpropagation
%SGD = Steepest Gradient Decent
function weights = nnSGDTrain (X, y, nhid_units, gamma, max_epoch, X_test, y_test)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W2 = rand (nhid_units + 1, oput_units);
W1 = rand (iput_units + 1, nhid_units);
train_rmse = zeros (1, max_epoch);
test_rmse = zeros (1, max_epoch);
for (epoch = 1:max_epoch)
delW2 = zeros (nhid_units + 1, oput_units)';
delW1 = zeros (iput_units + 1, nhid_units)';
for (i = 1:rows(X))
o1 = sigmoid ([X(i,:), 1] * W1); %1xn+1 * n+1xk = 1xk
o2 = sigmoid ([o1, 1] * W2); %1xk+1 * k+1xm = 1xm
D2 = o2 .* (1 - o2);
D1 = o1 .* (1 - o1);
e = (y_test(i,:) - o2)';
delta2 = diag (D2) * e; %mxm * mx1 = mx1
delta1 = diag (D1) * W2(1:(end-1),:) * delta2; %kxm * mx1 = kx1
delW2 = delW2 + (delta2 * [o1 1]); %mx1 * 1xk+1 = mxk+1 %already transposed
delW1 = delW1 + (delta1 * [X(i, :) 1]); %kx1 * 1xn+1 = k*n+1 %already transposed
end
delW2 = gamma .* delW2 ./ n;
delW1 = gamma .* delW1 ./ n;
W2 = W2 + delW2';
W1 = W1 + delW1';
[dummy train_rmse(epoch)] = nnPredict (X, y, nhid_units, [W1(:);W2(:)]);
[dummy test_rmse(epoch)] = nnPredict (X_test, y_test, nhid_units, [W1(:);W2(:)]);
printf ('Epoch: %d\tTrain Error: %f\tTest Error: %f\n', epoch, train_rmse(epoch), test_rmse(epoch));
fflush (stdout);
end
weights = [W1(:);W2(:)];
% plot (1:max_epoch, test_rmse, 1);
% hold on;
plot (1:max_epoch, train_rmse(1:end), 2);
% hold off;
end
predict
%Now SFNN Only
function [o1 rmse] = nnPredict (X, y, nhid_units, weights)
iput_units = columns (X);
oput_units = columns (y);
n = rows (X);
W1 = reshape (weights(1:((iput_units + 1) * nhid_units),1), iput_units + 1, nhid_units);
W2 = reshape (weights((((iput_units + 1) * nhid_units) + 1):end,1), nhid_units + 1, oput_units);
o1 = sigmoid ([X ones(n,1)] * W1); %nxiput_units+1 * iput_units+1xnhid_units = nxnhid_units
o2 = sigmoid ([o1 ones(n,1)] * W2); %nxnhid_units+1 * nhid_units+1xoput_units = nxoput_units
rmse = RMSE (y, o2);
end
RMSE function
function rmse = RMSE (a1, a2)
rmse = sqrt (sum (sum ((a1 - a2).^2))/rows(a1));
end
I have also trained the same dataset using the R RSNNS package mlp and the RMSE for train set (first 100 examples) are around 0.03 . But in my implementation I cannot achieve lower RMSE than 0.14 . And sometimes the errors grow for some higher learning rates, and no learning rate gets me lower RMSE than 0.14. Also a paper i referred report the RMSE in for the train set is around 0.03
I wanted to know where is the problem i the code. I have followed Raul Rojas book and confirmed that things are okay.
In backprobagation code the line
e = (y_test(i,:) - o2)';
is not correct, because the o2 is the output from the train set and i am finding the difference from one example from the test set y_test. The line should have been as below:
e = (y(i,:) - o2)';
which correctly finds the difference between the predicted output by the current model and the target output of the corresponding example.
This took me 3 days to find this one, I am fortunate enough to find this freaking bug which stopped me from going into further modifications.