Hello I have a question regarding my neural network. I use a neural network with two hidden layers with activation functions of tansig and purelin. I have 4 inputs and one output.
I trained my network with 25 neurons in layer 1 and 1 neuron in layer 2. Here is the schematic vie of my system
There is a good agreement between my ANN outputs and my targets
But after training the network when I want to derive out the equations of the system strange thing happens
I find the weights and bias of the system as follow
b1=First layer bias= net.b{1}
b2=Second layer bias=net.b{2}
Firstlayer weight net.IW{1,1}
Secondlayer weight net.LW{2,1}
And when I use this formula:
Y=b2+secondlayer weight*tanh((firstlayerweight*X+b1))
(Where X is my input) there is a large difference between my measured data and results of this formula (like the outputs I get are 1000 times larger than they should be).
Can anyone help me with this? Is this formula correct?
Edit:
Here are my weight and bias matrices:
%first layer weights(4*25):
111.993 48.59439 1.604747 -0.76531
10.8245 -107.793 173.7258 -123.2
-149.48 -105.421 102.6071 -79.2133
226.2926 -621.099 2440.947 776.9815
-66.5572 -116.593 -121.067 -55.3703
-9.1293 1.251525 2.716534 -0.3201
0.786728 -1.81314 -5.21815 1.619898
71.98442 -3.67078 -17.8482 5.911515
6.986454 -36.9755 -21.4408 1.50746
0.654341 -5.25562 10.34482 4.589759
0.304266 1.645312 5.004313 -1.51475
0.721048 -0.02945 -0.09663 -0.0004
60.96135 1.182228 4.733361 -0.40264
-1.58998 1.920395 5.533581 -1.71799
0.410825 0.056715 4.564767 -0.1073
2.298645 9.923646 82.42445 -8.89119
-2.46618 -1.59946 -3.41954 -2.68133
0.089749 -1.69783 -5.02845 1.541547
3.516938 0.358534 -10.9719 -0.33401
2.392706 -1.99236 -5.89471 1.815673
1.605158 4.174882 4.227769 -3.14685
-25.2093 -1.68014 -5.249 1.163255
52.30716 -67.498 87.13013 29.61436
9.195869 2.328432 -7.59047 -1.42355
3.120973 1.266733 8.182079 0.365206
%first layer biases(25):
47.07941005
-49.66890557
80.2745463
1251.640193
-228.1521936
-2.905218293
-2.802770641
52.59183653
-50.96014049
7.931255305
3.652513556
-0.125595632
40.47792689
2.343663068
1.712611331
67.61242524
-6.124504787
-3.288283849
-4.752833412
-1.921129618
6.818490015
-6.481531096
5.056644951
1.984717285
7.050001634
%second layer weights(25):
-97.96825145 122.9670213 -124.5566672 -0.046986176 -0.021205068 -5.990650681 1850.804141 2.964275181 -0.333027823 -0.070420859 -583.478589 -68.16211954 12.59658596 1257.471165 -138.4739514 0.07786882 0.238340539 -1546.523224 -2.751936791 363.5612929 -0.152249267 -20.71141251 0.094593198 -0.473042306 5.533999251
%second layer bias(1):
21.92849
For example when I put X=[8;400;5;9.5] I expect to get y = 20.8
But by using the formula
y=secondbias +secondlayer weight* tanh(firstlayer weight*X+first bias)
I get -111 for y which is strange
Related
I am trying to make my head around Accord.NET.Neuro. I need a NN library to be used for a reinforcement learning problem. Following one of the examples, I have written this small piece of code in F#:
let inputs = [| [|0.0;1.0|] ; [|1.0;1.0|] |]
let inputdimension = inputs.[0] |> Array.length
let outputs = [| [|1.0|] ; [|0.0|] |]
let outputdimension = outputs.[0] |> Array.length
let network = Accord.Neuro.ActivationNetwork (
SigmoidFunction (2.0) , // transfer function
inputdimension,
2 , // two neuron in first layer
outputdimension ) // one neuron in second layer
let teacher = network |> LevenbergMarquardtLearning
teacher.RunEpoch(inputs,outputs)
How can I obtain the weights from trained network object? Network does not have any weight property, as far as I can tell. Also, in order to make predictions, there is a Compute method; so -after learning- a prediction is made running:
network.Compute( [|1.0;1.0|] )
for example for a given input. I have noticed that, after several epochs, the network adapts incrementally to the desired targets (as it should be), but -for the training- one just runs
teacher.RunEpoch(inputs,outputs)
several times. Apparently this affects the network instance: how is it possible?
Weights are accessible through the layers property and then neuron.
So, for the given example,
network.Layers
provides an array of Layer (Layer[]) where each elements gives information (and data) of each hidden layer. In the example we have input, hidden, and output: so we identify the internal connections: input to hidden and hidden to output.
Suppose we want to know the weights from the input to hidden layer:
network.Layers.[0]
this line would return a Layer object (Accord.Neuro.Layer) which has a field Neuron.
It is worthwile to look into the interface of these two objects because they represent the neural computations in Accord.Neuro.
Neuro reports the weights of the specific computational unit (and their threshold).
So, a possible helper function to go through the network and get both weights and thresholds would be :
let getWeigths (n:ActivationNeuron ) =
(n.Weights, n.Threshold)
let getNetworkParameters (network:ActivationNetwork) =
network.Layers
|> Array.map ( fun layer -> layer.Neurons
|> Array.map (fun neuron ->
neuron :?> ActivationNeuron
|> getWeigths) )
I may add some other notes, the more I go through the Accord.Neuro API.
I'm creating an LSTM Encoder-Decoder Network, using Keras, following the code provided here: https://github.com/LukeTonin/keras-seq-2-seq-signal-prediction. The only change I made is to replace the GRUCell with an LSTMCell. Basically both the encoder and decoder consists of 2 layers, of 35 LSTMCells. The layers are stacked over (and combined with) each other using an RNN Layer.
The LSTMCell returns 2 states whereas the GRUCell returns 1 state. This is where I am encountering an error, as I do not know how to code for the 2 returned states of the LSTMCell.
I have created two models: first, an encoder-decoder model. Second, a prediction model. I am not encountering any problems in the encoder-decoder model, but a encountering problems in the decoder of the prediction model.
The error I am getting is:
ValueError: Layer rnn_4 expects 9 inputs, but it received 3 input tensors. Input received: [<tf.Tensor 'input_4:0' shape=(?, ?, 1) dtype=float32>, <tf.Tensor 'input_11:0' shape=(?, 35) dtype=float32>, <tf.Tensor 'input_12:0' shape=(?, 35) dtype=float32>]
This error happens when this line below, in the prediction model, is run:
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
The section of code this fits into is:
encoder_predict_model = keras.models.Model(encoder_inputs,
encoder_states)
decoder_states_inputs = []
# Read layers backwards to fit the format of initial_state
# For some reason, the states of the model are order backwards (state of the first layer at the end of the list)
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU, but two states for LSTMCell
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_outputs_and_states = decoder(
decoder_inputs, initial_state=decoder_states_inputs)
decoder_outputs = decoder_outputs_and_states[0]
decoder_states = decoder_outputs_and_states[1:]
decoder_outputs = decoder_dense(decoder_outputs)
decoder_predict_model = keras.models.Model(
[decoder_inputs] + decoder_states_inputs,
[decoder_outputs] + decoder_states)
Could somebody help me with the for loop above, and initial states I should be passing the decoder after that?
I had an similar error and i solved just doing what he says, adding another input tensor:
# If instead of a GRU you were using an LSTM Cell, you would have to append two Input tensors since the LSTM has 2 states.
for hidden_neurons in layers[::-1]:
# One state for GRU
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
decoder_states_inputs.append(keras.layers.Input(shape=(hidden_neurons,)))
here it solved the prolem...
I'm testing the y = SinC(x) function with single hidden layer feedforward neural networks (SLFNs) with 20 neurons.
With a SLFN, in the output layer, the output weight(OW) can be described by
OW = pinv(H)*T
after adding regularized parameter gamma, which
OW = pinv(I/gamma+H'*H)*H'*T
with
gamma -> Inf, pinv(H'*H)*H'*T == pinv(H)*T, also pinv(H'*H)*H' == pinv(H).
But when I try to calculate pinv(H'*H)*H' and pinv(H), I find a huge difference between these two when neurons number is over 5 (under 5, they are equal or almost the same).
For example, when H is 10*10 matrix, cond(H) = 21137561386980.3, rank(H) = 10,
H = [0.736251410036783 0.499731137079796 0.450233920602169 0.296610970576716 0.369359425954153 0.505556211442208 0.502934880027889 0.364904559142718 0.253349959726753 0.298697900877265;
0.724064281864009 0.521667364351399 0.435944895257239 0.337878535128756 0.364906002569385 0.496504064726699 0.492798607017131 0.390656915261343 0.289981152837390 0.307212326718916;
0.711534656474153 0.543520341487420 0.421761457948049 0.381771374416867 0.360475582262355 0.487454209236671 0.482668250979627 0.417033287703137 0.329570921359082 0.315860145366824;
0.698672860220896 0.565207057974387 0.407705930918082 0.427683127210120 0.356068794706095 0.478412571446765 0.472552121296395 0.443893207685379 0.371735862991355 0.324637323886021;
0.685491077062637 0.586647027111176 0.393799811411985 0.474875155650945 0.351686254239637 0.469385056318048 0.462458480695760 0.471085139463084 0.415948455902421 0.333539494486324;
0.672003357663056 0.607763454504209 0.380063647372632 0.522520267708374 0.347328559602877 0.460377531907542 0.452395518357816 0.498449772544129 0.461556360076788 0.342561958147251;
0.658225608290477 0.628484290731116 0.366516925684188 0.569759064961507 0.342996293691614 0.451395814182317 0.442371323528726 0.525823695636816 0.507817005881821 0.351699689941632;
0.644175558300583 0.648743139215935 0.353177974096445 0.615761051907079 0.338690023332811 0.442445652121229 0.432393859824045 0.553043275759248 0.553944175102542 0.360947346089454;
0.629872705346690 0.668479997764613 0.340063877672496 0.659781468051379 0.334410299080102 0.433532713184646 0.422470940392161 0.579948548513999 0.599160649563718 0.370299272759337;
0.615338237874436 0.687641820315375 0.327190410302607 0.701205860709835 0.330157655029498 0.424662569229062 0.412610204098877 0.606386924575225 0.642749594844498 0.379749516620049];
T=[-0.806458764562879 -0.251682808380338 -0.834815868451399 -0.750626822371170 0.877733363571576 1 -0.626938984683970 -0.767558933097629 -0.921811074815239 -1]';
There is a huge difference between pinv(H'*H)*H*T and pinv(H)*T, where
pinv(H'*H)*H*T = [-4803.39093243484 3567.08623820149 668.037919243849 5975.10699147077
1709.31211566970 -1328.53407325092 -1844.57938928594 -22511.9388736373
-2377.63048959478 31688.5125271114]';
pinv(H)*T = [-19780274164.6438 -3619388884.32672 -76363206688.3469 16455234.9229156
-135982025652.153 -93890161354.8417 283696409214.039 193801203.735488
-18829106.6110445 19064848675.0189]'.
I also find that if I round H , round(H,2), pinv(H'*H)*H*T and pinv(H)*T return the same answer. So I guess one of the reason might be the float calculation issue inside the matlab.
But since cond(H) is large, any small change of H may result in large difference in the inverse of H. I think the round function may not be a good option to test. As Cris Luengo mentioned, with large cond,the numerical imprecision will affect the accuracy of inverse.
In my test, I use 1000 training samples Input:[-10,10], with noise between [-0.2,0.2], and test samples are noise free. 20 neurons are selected. The OW = pinv(H)*Tcan give reasonable results for SinC training, while the performance for OW = pinv(H'*H)*T is worse. Then I try to increase the precision of H'*H by pinv(vpa(H'*H)), there's no significant improvement.
Does anyone know how to solve this?
After some research, the answer is that ELM is very sentive to scaling and activation function.
Please refer to this paper for details: https://dl.acm.org/citation.cfm?id=2797143.2797161
And paper: https://ieeexplore.ieee.org/document/8533625 demonstrated a noval algorithm to improve the perforamance of ELM for scaling.
So, I'm trying to implement a neural network with 3 layers in python, however I am not the brightest person so anything with more then 2 layers is kinda difficult for me. The problem with this one is that it gets stuck at .5 and does not learn I have no actual clue where it went wrong. Thank you for anyone with the patience to explain the error to me. (I hope the code makes sense)
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=[np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]
output=[0,1,1,0,1]
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
for justanumber in range(1000):
for i in range(len(l0)):
l1=sigmoid(np.dot(l0[i],syn0))
l2=sigmoid(np.dot(l1,syn1))
l2_err=output[i]-l2
l2_delta=reduce(l2_err)
l1_err=syn1*l2_delta
l1_delta=reduce(l1_err)
syn1=syn1.T
syn1+=l0[i].T*l2_delta
syn1=syn1.T
syn0=syn0.T
syn0+=l0[i].T*l1_delta
syn0=syn0.T
print l2
PS. I know that it might be a piece of trash as a script but that is why I asked for assistance
Your computations are not fully correct. For example, the reduce is called on the l1_err and l2_err, where it should be called on l1 and l2.
You are performing stochastic gradient descent. In this case with such few parameters, it oscilates hugely. In this case use a full batch gradient descent.
The bias units are not present. Although you can still learn without bias, technically.
I tried to rewrite your code with minimal changes. I have commented your lines to show the changes.
#!/usr/bin/python3
import matplotlib.pyplot as plt
import numpy as np
def sigmoid(x):
return 1/(1+np.exp(-x))
def reduce(x):
return x*(1-x)
l0=np.array ([np.array([1,1,0,0]),
np.array([1,0,1,0]),
np.array([1,1,1,0]),
np.array([0,1,0,1]),
np.array([0,0,1,0]),
]);
output=np.array ([[0],[1],[1],[0],[1]]);
syn0=np.random.random((4,4))
syn1=np.random.random((4,1))
final_err = list ();
gamma = 0.05
maxiter = 100000
for justanumber in range(maxiter):
syn0_del = np.zeros_like (syn0);
syn1_del = np.zeros_like (syn1);
l2_err_sum = 0;
for i in range(len(l0)):
this_data = l0[i,np.newaxis];
l1=sigmoid(np.matmul(this_data,syn0))[:]
l2=sigmoid(np.matmul(l1,syn1))[:]
l2_err=(output[i,:]-l2[:])
#l2_delta=reduce(l2_err)
l2_delta=np.dot (reduce(l2), l2_err)
l1_err=np.dot (syn1, l2_delta)
#l1_delta=reduce(l1_err)
l1_delta=np.dot(reduce(l1), l1_err)
# Accumulate gradient for this point for layer 1
syn1_del += np.matmul(l2_delta, l1).T;
#syn1=syn1.T
#syn1+=l1.T*l2_delta
#syn1=syn1.T
# Accumulate gradient for this point for layer 0
syn0_del += np.matmul(l1_delta, this_data).T;
#syn0=syn0.T
#syn0-=l0[i,:].T*l1_delta
#syn0=syn0.T
# The error for this datpoint. Mean sum of squares
l2_err_sum += np.mean (l2_err ** 2);
l2_err_sum /= l0.shape[0]; # Mean sum of squares
syn0 += gamma * syn0_del;
syn1 += gamma * syn1_del;
print ("iter: ", justanumber, "error: ", l2_err_sum);
final_err.append (l2_err_sum);
# Predicting
l1=sigmoid(np.matmul(l0,syn0))[:]# 1 x d * d x 4 = 1 x 4;
l2=sigmoid(np.matmul(l1,syn1))[:] # 1 x 4 * 4 x 1 = 1 x 1
print ("Predicted: \n", l2)
print ("Actual: \n", output)
plt.plot (np.array (final_err));
plt.show ();
The output I get is:
Predicted:
[[0.05214011]
[0.97596354]
[0.97499515]
[0.03771324]
[0.97624119]]
Actual:
[[0]
[1]
[1]
[0]
[1]]
Therefore the network was able to predict all the toy training examples. (Note in real data you would not like to fit the data at its best as it leads to overfitting). Note that you may get a bit different result, as the weight initialisations are different. Also, try to initialise the weight between [-0.01, +0.01] as a rule of thumb, when you are not working on a specific problem and you specifically know the initialisation.
Here is the convergence plot.
Note that you do not need to actually iterate over each example, instead you can do matrix multiplication at once, which is much faster. Also, the above code does not have bias units. Make sure you have bias units when you re-implement the code.
I would recommend you go through the Raul Rojas' Neural Networks, a Systematic Introduction, Chapter 4, 6 and 7. Chapter 7 will tell you how to implement deeper networks in a simple way.
I have a data set of 3 different variables, each variable has 37 data points as follows:
Variable_1 = [0.489274770173646 0.534659090909091 0.496806966618287 0.593160935871933 0.542091836734694 0.514607775477341 0.580715497052410 0.542977656178750 0.624465240641712 0.644904791797447 0.444644611857190 0.464080100125156 0.522286821705426 0.507719139590466 0.612791008830612 0.561735261401557 0.524166666666667 0.526627218934911 0.449009900990099 0.472768878718535 0.488477561567263 0.576187425642902 0.558307692307692 0.609308792372882 0.647109905020352 0.513392857142857 0.454701120797011 0.557692307692308 0.511568509615385 0.440248676030394 0.500000000000000 0.593340146482712 0.518269230769230 0.623676307886835 0.563086974275214 0.609080188679245 0.769444444444444]
Variable_2 = [0.573717948717949 0.489656381486676 0.443821689259645 0.578812453392990 0.678328092243187 0.476432291666667 0.460748792270531 0.593650793650794 0.585645494152717 0.540435139573071 0.536423112870416 0.471528337362342 0.514469014469015 0.459801313718039 0.674409015942826 0.526881720430108 0.437327188940092 0.531890398342160 0.479985035540591 0.449145299145299 0.553381642512077 0.524932614555257 0.652630308880308 0.561587521131090 0.560003234675724 0.537254901960784 0.521990521327014 0.466041489059392 0.571461291800275 0.413770728190339 0.493939393939394 0.458024968229051 0.579528535980149 0.512145748987855 0.567205861018424 0.463562753036437 0.562938596491228]
Variable_3 = [0.630327868852459 0.521367521367521 0.467658730158730 0.485012755102041 0.523217247097844 0.449032738095238 0.574519230769231 0.594594594594595 0.544390243902439 0.581524147097918 0.487662337662338 0.497564726993079 0.417307692307692 0.609668109668110 0.508928571428572 0.511870845204179 0.444067796610169 0.562337662337663 0.494043887147335 0.530476190476191 0.484235294117647 0.502136752136752 0.632418524871355 0.528787878787879 0.619780219780220 0.416958041958042 0.552419354838710 0.586057692307692 0.461351186853317 0.495276653171390 0.524305555555555 0.655671296296296 0.496873496873497 0.462542087542088 0.660491689750693 0.772549019607843 0.558589870903674]
I put all three variables in a matrix, where the columns are the variables and the rows are the 37 data points.
I uses the PCA function in MATLAB and it gives me the following matrix:
PCA = 0.6370 0.3070 0.7071
0.3494 0.7026 -0.6199
0.6871 -0.6420 -0.3403
First Question: What does each row and each column represent in the PCA matrix.
Second Question: How can I use this matrix to plot each variable along its principle component in 3 dimensions.
Thank you, I very appreciate any help