Can anyone recommend a website or give me a brief of how backpropagation is implemented in a NN? I understand the basic concept, but I'm unsure of how to go about writing the code.
Many of sources I've found simply show equations without giving any explanation of why they're doing it, and the variable names make it difficult to find out.
Example:
void bpnn_output_error(delta, target, output, nj, err)
double *delta, *target, *output, *err;
int nj;
{
int j;
double o, t, errsum;
errsum = 0.0;
for (j = 1; j <= nj; j++) {
o = output[j];
t = target[j];
delta[j] = o * (1.0 - o) * (t - o);
errsum += ABS(delta[j]);
}
*err = errsum;
}
In that example, can someone explain the purpose of
delta[j] = o * (1.0 - o) * (t - o);
Thanks.
The purpose of
delta[j] = o * (1.0 - o) * (t - o);
is to find the error of an output node in a backpropagation network.
o represents the output of the node, t is the expected value of output for the node.
The term, (o * (1.0 - o), is the derivative of a common transfer function used, the sigmoid function. (Other transfer functions are not uncommon, and would require a rewrite of the code that has the sigmoid first derivative instead. A mismatch between function and derivative would likely mean that training would not converge.) The node has an "activation" value that is fed through a transfer function to obtain the output o, like
o = f(activation)
The main thing is that backpropagation uses gradient descent, and the error gets backward-propagated by application of the Chain Rule. The problem is one of credit assignment, or blame if you will, for the hidden nodes whose output is not directly comparable to the expected value. We start with what is known and comparable, the output nodes. The error is taken to be proportional to the first derivative of the output times the raw error value between the expected output and actual output.
So more symbolically, we'd write that line as
delta[j] = f'(activation_j) * (t_j - o_j)
where f is your transfer function, and f' is the first derivative of it.
Further back in the hidden layers, the error at a node is its estimated contribution to the errors found at the next layer. So the deltas from the succeeding layer are multiplied by the connecting weights, and those products are summed. That sum is multiplied by the first derivative of the activation of the hidden node to get the delta for a hidden node, or
delta[j] = f'(activation_j) * Sum(delta[k] * w_jk)
where j now references a hidden node and k a node in a succeeding layer.
(t-o) is the error in the output of the network since t is the target output and o is the actual output. It is being stored in a normalized form in the delta array. The method used to normalize depends on the implementation and the o * ( 1.0 - o ) seems to be doing that (I could be wrong about that assumption).
This normalized error is accumulated for the entire training set to judge when the training is complete: usually when errsum is below some target threshold.
Actually, if you know the theory, the programs should be easy to understand. You can read the book and do some simple samples using a pencil to figure out the exact steps of propagation. This is a general principle for implementing numerical programs, you must know the very details in small cases.
If you know Matlab, I'd suggest you to read some Matlab source code (e.g. here), which is easier to understand than C.
For the code in your question, the names are quite self-explanatory, output may be the array of your prediction, target may be the array of training labels, delta is the error between prediction and true values, it also serves as the value to be updated into the weight vector.
Essentially, what backprop does is run the network on the training data, observe the output, then adjust the values of the nodes, going from the output nodes back to the input nodes iteratively.
Related
What is the idea behind double QN?
The Bellman equation used to calculate the Q values to update the online network follows the equation:
value = reward + discount_factor * target_network.predict(next_state)[argmax(online_network.predict(next_state))]
The Bellman equation used to calculate the Q value updates in the original DQN is:
value = reward + discount_factor * max(target_network.predict(next_state))
but the target network for evaluating the action is updated using weights of the
online_network and the value and fed to the target value is basically old q value of the action.
any ideas how adding another networks based on weights from the first network helps?
I really liked the explanation from here:
https://becominghuman.ai/beat-atari-with-deep-reinforcement-learning-part-2-dqn-improvements-d3563f665a2c
"This is actually quite simple: you probably remember from the previous post that we were trying to optimize the Q function defined as follows:
Q(s, a) = r + γ maxₐ’(Q(s’, a’))
Because this definition is recursive (the Q value depends on other Q values), in Q-learning we end up training a network to predict its own output, as we pointed out last time.
The problem of course is that at each minibatch of training, we are changing both Q(s, a) and Q(s’, a’), in other words, we are getting closer to our target but also moving our target! This can make it a lot harder for our network to converge.
It thus seems like we should instead use a fixed target so as to avoid this problem of the network “chasing its own tail”, but of course that isn’t possible since the target Q function should get better and better as we train."
I'm trying to build a neural network that will take in the solutions to a system of ODE's and predict the parameters of the system. I'm using Julia and in particular, the DiffEqFlux package. The structure of a network is a few simple Dense layers chained together that predict some intermediate parameters (in this case, some chemical reaction free energies), which then feed into some deterministic (non-trained) layers that convert those parameters into the ones that go into the system of equations (in this case, reaction rate constants). I've tried two different approaches from here:
Chain the ODE solve directly on as the last layer of the network. In this case, the loss function is just comparing the inputs to the outputs.
Have the ODE solve in the loss function, so the network output is just the parameters.
However, in neither case can I get Flux.train! to actually run.
A silly little example for the first option that gives the same error I'm getting (I've tried to keep as many things parallel to my actual case as possible, i.e. the solver, etc., although I did omit the intermediate deterministic layers since they don't seem to make a difference) is shown below.
using Flux, DiffEqFlux, DifferentialEquations
# let's use Chris' favorite example, Lotka-Volterra
function lotka_volterra(du,u,p,t)
x, y = u
α, β, δ, γ = p
du[1] = dx = α*x - β*x*y
du[2] = dy = -δ*y + γ*x*y
end
u0 = [1.0,1.0]
tspan = (0.0,10.0)
# generate a couple sets of solutions to train on
training_params = [[1.5,1.0,3.0,1.0], [1.4,1.1,3.1,0.9]]
training_sols = [solve(ODEProblem(lotka_volterra, u0, tspan, tp)).u[end] for tp in training_params]
model = Chain(Dense(2,3), Dense(3,4), p -> diffeq_adjoint(p, ODEProblem(lotka_volterra, u0, tspan, p), Rodas4())[:,end])
# in this case we just want outputs to match inputs
# (actual parameters we're after are outputs of next-to-last layer)
training_data = zip(training_sols, training_sols)
# mean squared error loss
loss(x,y) = Flux.mse(model(x), y)
p = Flux.params(model[1:2])
Flux.train!(loss, p, training_data, ADAM(0.001))
# gives TypeError: in typeassert, expected Float64, got ForwardDiff.Dual{Nothing, Float64, 8}
I've tried all three solver layers, diffeq_adjoint, diffeq_rd, and diffeq_fd, none of which work, but all of which give different errors that I'm having trouble parsing.
For the other option (which I'd actually prefer, but either way would work), just replace the model and loss function definitions as:
model = Chain(Dense(2,3), Dense(3,4))
function loss(x,y)
p = model(x)
sol = diffeq_adjoint(p, ODEProblem(lotka_volterra, u0, tspan, p), Rodas4())[:,end]
Flux.mse(sol, y)
end
The same error is thrown as above.
I've been hacking at this for over a week now and am completely stumped; any ideas?
You're running into https://github.com/JuliaDiffEq/DiffEqFlux.jl/issues/31, i.e. forward-mode AD for the Jacobian doesn't play nice with Flux.jl right now. To get around this, use Rodas4(autodiff=false) instead.
I'm new to reinforcement learning at all, so I may be wrong.
My questions are:
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
I've already tried to find what I want and I've seen many tutorials, but almost everyone doesn't show the background, just implements it using Python library like Keras.
Thanks in advance and I apologise if something sounds dumb, I just don't get that.
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Yes, generally that equation is only used to define our losses. More specifically, it is rearranged a bit; that equation is what we expect to hold, but it generally does not yet precisely hold during training. We subtract the right-hand side from the left-hand side to compute a (temporal-difference) error, and that error is used in the loss function.
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
Indeed the space of state-action pairs is much too large to enumerate them all in a matrix/table. In other words, we can't use Tabular RL. This is precisely why we use a Neural Network in DQN though. You can view Q(s, a) as a function. In the tabular case, Q(s, a) is simply a function that uses s and a to index into a table/matrix of values.
In the case of DQN and other Deep RL approaches, we use a Neural Network to approximate such a "function". We use s (and potentially a, though not really in the case of DQN) to create features based on that state (and action). In the case of DQN and Atari games, we simply take a stack of raw images/pixels as features. These are then used as inputs for the Neural Network. At the other end of the NN, DQN provides Q-values as outputs. In the case of DQN, multiple outputs are provided; one for every action a. So, in conclusion, when you read Q(s, a) you should think "the output corresponding to a when we plug the features/images/pixels of s as inputs into our network".
Further question from comments:
I think I still don't get the idea... Let's say we did one iteration through the network with state S and we got following output [A = 0.8, B = 0.1, C = 0.1] (where A, B and C are possible actions). We also got a reward R = 1 and set the y (a.k.a. gamma) to 0.95 . Now, how can we put these variables into the loss function formula https://imgur.com/a/2wTj7Yn? I don't understand what's the prediction if the DQN outputs which action to take? Also, what's the target Q? Could you post the formula with placed variables, please?
First a small correction: DQN does not output which action to take. Given inputs (a state s), it provides one output value per action a, which can be interpreted as an estimate of the Q(s, a) value for the input state s and the action a corresponding to that particular output. These values are typically used afterwards to determine which action to take (for example by selecting the action corresponding to the maximum Q value), so in some sense the action can be derived from the outputs of DQN, but DQN does not directly provide actions to take as outputs.
Anyway, let's consider the example situation. The loss function from the image is:
loss = (r + gamma max_a' Q-hat(s', a') - Q(s, a))^2
Note that there's a small mistake in the image, it has the old state s in the Q-hat instead of the new state s'. s' in there is correct.
In this formula:
r is the observed reward
gamma is (typically) a constant value
Q(s, a) is one of the output values from our Neural Network that we get when we provide it with s as input. Specifically, it is the output value corresponding to the action a that we have executed. So, in your example, if we chose to execute action A in state s, we have Q(s, A) = 0.8.
s' is the state we happen to end up in after having executed action a in state s.
Q-hat(s', a') (which we compute once for every possible subsequent action a') is, again, one of the output values from our Neural Network. This time, it's a value we get when we provide s' as input (instead of s), and again it will be the output value corresponding to action a'.
The Q-hat instead of Q there is because, in DQN, we typically actually use two different Neural Networks. Q-values are computed using the same Neural Network that we also modify by training. Q-hat-values are computed using a different "Target Network". This Target Network is typically a "slower-moving" version of the first network. It is constructed by occasionally (e.g. once every 10K steps) copying the other Network, and leaving its weights frozen in between those copy operations.
Firstly, the Q function is used both in the loss function and for the policy. Actual output of your Q function and the 'ideal' one is used to calculate a loss. Taking the highest value of the output of the Q function for all possible actions in a state is your policy.
Secondly, no, it's not recurrent. The equation is actually slightly different to what you have posted (perhaps a mathematician can correct me on this). It is actually Q(s, a) := r + y * max(Q(s', a')). Note the colon before the equals sign. This is called the assignment operator and means that we update the left side of the equation so that it is equal to the right side once (not recurrently). You can think of it as being the same as the assignment operator in most programming languages (x = x + 1 doesn't cause any problems).
The Q values will propagate through the network as you keep performing updates anyway, but it can take a while.
I am trying to minimize a highly non-linear function by optimizing three unknown parameters a, b, and c0. I'm attempting to replicate some governing equations of a casino roulette ball in Python 3.
Here is the link to the research paper:
http://www.dewtronics.com/tutorials/roulette/documents/Roulette_Physik.pdf
I will be referencing equations (35) and (40) in the paper.
Basically, I take stopwatch lap measurements of the roulette ball spinning on the wheel. For each successive lap, the lap time will increase because of losses of momentum to non-conservative forces of friction. Then I take these time measurements and fit equation (35) using a Levenberg-Marquardt least squares method in equation (40).
My question is twofold:
(1) I'm using the scipy.optimize.least_squares() method='lm', and I'm not sure how to write the objective function! Right now I have the function written exactly as is in the paper:
def fall_time(k,a,b,c0):
F = (1 / (a * b)) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
return F
def parameter_estimation_function(x0,tk):
a = x0[0]
b = x0[1]
c0 = x0[2]
S = 0
for i,t in enumerate(tk):
k = i + 1
S += (t - fall_time(k,a,b,c0))**2
return [S,1,1]
sol = least_squares(parameter_estimation_function,[0.1,0.8,-0.1],args=([tk1]),method='lm',jac='2-point',max_nfev=2000)
print(sol)
Now, in the documentation examples, I never saw the objective function written the way I have it. In the documentation, the objective function is always returns the residual, not the square of the residual. Additionally, in the documentation they never use the sum! So I'm wondering if the sum and the square are automatically handled under the hood of least_squares()?
(2) Perhaps my second question is a result of my failure to understand how to write the objective function. But anyhow, I'm having trouble getting the algorithm to converge on the minimum. I know this is because the levenberg alogrithm is "greedy" and stops near the closest minima, but I figured that I would be able to at least converge on about the same result given different initial guesses. With slight alterations in the initial guess, I'm getting parameter results with different signs. Additionally, I've yet to find a combination of initial guesses that allows the algo to converge! It always times out before it finds the solution. I've even increased the amount of function evaluations to 10,000 to see if it would. To no avail!
Perhaps somebody could shed some light on my mistakes here! I'm still relatively new to python and the scipy library!
Here is some sample data for tk that I've measured myself from the video here: https://www.youtube.com/watch?v=0Zj_9ypBnzg
tk = [0.52,1.28,2.04,3.17,4.53,6.22]
tk1 = [0.51,1.4,2.09,3,4.42,6.17]
tk2 = [0.63,1.35,2.19,3.02,4.57,6.29]
tk3 = [0.63,1.39,2.23,3.28,4.70,6.32]
tk4 = [0.57,1.4,2.1,3.06,4.53,6.17]
Thanks
1) Yes, as you suspected the sum and the square of the residuals are automatically handled.
2) Hard to say, since I'm not deeply familiar with the problem (e.g., how many local minima exist, what constitutes a 'reasonable' result, etc.). I may investigate more later.
But for kicks I fiddled with some of the values to see what would happen. For example, you can just replace the 1/b constant with a standalone variable b_inv, and this seemed to stabilize the results quite a bit. Here's the code I used to check results. (Note that I rewrote the objective function for brevity. It simply leverages the element-wise operations of numpy arrays, without changing the overall result.)
import numpy as np
from scipy.optimize import least_squares
def fall_time(k,a,b_inv,c0):
return (b_inv / a) * (c0 - np.arcsinh(c0) * np.exp(a * k * 2 * np.pi))
def parameter_estimation_function(x,tk):
return np.asarray(tk) - fall_time(k=np.arange(1,len(tk)+1), a=x[0],b_inv=x[1],c0=x[2])
tk_samples = [
[0.52,1.28,2.04,3.17,4.53,6.22],
[0.51,1.4,2.09,3,4.42,6.17],
[0.63,1.35,2.19,3.02,4.57,6.29],
[0.63,1.39,2.23,3.28,4.70,6.32],
[0.57,1.4,2.1,3.06,4.53,6.17]
]
for i in range(len(tk_samples)):
sol = least_squares(parameter_estimation_function,[0.1,1.25,-0.1],
args=(tk_samples[i],),method='lm',jac='2-point',max_nfev=2000)
print(sol.x)
with console output:
[ 0.03621789 0.64201913 -0.12072879]
[ 3.59319972e-02 1.17129458e+01 -6.53358716e-03]
[ 3.55516005e-02 1.48491493e+01 -5.31098257e-03]
[ 3.18068316e-02 1.11828091e+01 -7.75329834e-03]
[ 3.43920725e-02 1.25160378e+01 -6.36307506e-03]
This question is somewhat related to a previous question of mine, where I didn't quite get the right solution. Link: Earlier SO-thread
I am solving PDEs which are time variant with one spatial dimension (e.g. the heat equation - see link below). I'm using the numerical method of lines, i.e. discretizing the spatial derivatives yielding a system of ODEs which are readily solved in Modelica (using the Dymola tool). My problems arise when I simulate the system, or when I plot the results, to be precise. The equations themselves appear to be solved correctly, but I want to express the spatial changes in all the discretized state variables at specific points in time rather than the individual time-varying behavior of each discrete state.
The strategy leading up to my problems is illustrated in this Youtube tutorial, which by the way is not made by me. As you can see at the very end of the tutorial, the time-varying behavior of the temperature is plotted for all the discrete points in the rod, individually. What I would like is a plot showing the temperature through the rod at a specific time, that is the temperature as a function of the spatial coordinate. My strategy to achieve this, which I'm struggling with, is: Given a state vector of N entries:
Real[N] T "Temperature";
..I would use the plotArray Dymola function as shown below.
plotArray( {i for i in 1:N}, {T[i] for i in 1:N} )
Intuitively, this would yield a plot showing the temperature as a function of the spatial coordiate, or the number in the line of discrete units, to be precise. Although this command yields a result, all T-values appear to be 0 in the plot, which is definitely not the case. My question is: How can I successfully obtain and plot the temperatures at all the discrete points at a given time? Thanks in advance for your help.
The code for the problem is as indicated below.
model conduction
parameter Real rho = 1;
parameter Real Cp = 1;
parameter Real L = 1;
parameter Real k = 1;
parameter Real Tlo = 0;
parameter Real Thi = 100;
parameter Real Tinit = 30;
parameter Integer N = 10 "Number of discrete segments";
Real T[N-1] "Temperatures";
Real deltaX = L/N;
initial equation
for i in 1:N-1 loop
T[i] = Tinit;
end for;
equation
rho*Cp*der(T[1]) = k*( T[2] - 2*T[1] + Thi) /deltaX^2;
rho*Cp*der(T[N-1]) = k*( Tlo - 2*T[N-1] + T[N-2]) /deltaX^2;
for i in 2:N-2 loop
rho*Cp*der(T[i]) = k*( T[i+1] - 2*T[i] + T[i-1]) /deltaX^2;
end for
annotation (uses(Modelica(version="3.2")));
end conduction;
Additional edit: The simulations show clearly that for example T[3], that is the temperature of discrete segment no. 3, starts out from 30 and ends up at 70 degrees. When I write T[3] in my command window, however, I get T3 = 0.0 in return. Why is that? This is at the heart of the problem, because the plotArray function would be working if I managed to extract the actual variable values at specific times and not just 0.0.
Suggested solution: This is a rather tedious solution to achieve what I want, and I hope someone knows a better solution. When I run the simulation in Dymola, the software generates a .mat-file containing the values of the variables throughout the time of the simulation. I am able to load this file into MATLAB and manually extract the variables of my choice for plotting. For the problem above, I wrote the following command:
plot( [1:9]' , data_2(2:2:18 , 10)' )
This command will plot the temperatures (as the temperatures are stored together with their derivates in the data_2 array in the .mat-file) against the respetive number of the discrete segment/element. I was really hoping to do this inside Dymola, that is avoid using MATLAB for this. For this specific problem, the amount of variables was low on account of the simplicity of this problem, but I can easily image a .mat-file which is signifanctly harder to navigate through manually like I just did.
Although you do not mention it explicitly I assume that you enter your plotArray command in Dymola's command window. That won't work directly, since the variables you see there do not include your simulation results: If I simulate your model, and then enter T[:] in Dymola's command window, then the printed result is
T[:]
= {0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0}
I'm not a Dymola expert, and the only solution I've found (to actively store and load the desired simulation results) is quite cumbersome:
simulateModel("conduction", resultFile="conduction.mat")
n = readTrajectorySize("conduction.mat")
X = readTrajectory("conduction.mat", {"Time"}, n)
Y = readTrajectory("conduction.mat", {"T[1]", "T[2]", "T[3]"}, n)
plotArrays(X[1, :], transpose(Y))