Calculating Q value in dqn with experience replay - neural-network

consider the Deep Q-Learning algorithm
1 initialize replay memory D
2 initialize action-value function Q with random weights
3 observe initial state s
4 repeat
5 select an action a
6 with probability ε select a random action
7 otherwise select a = argmaxa’Q(s,a’)
8 carry out action a
9 observe reward r and new state s’
10 store experience <s, a, r, s’> in replay memory D
11
12 sample random transitions <ss, aa, rr, ss’> from replay memory D
13 calculate target for each minibatch transition
14 if ss’ is terminal state then tt = rr
15 otherwise tt = rr + γmaxa’Q(ss’, aa’)
16 train the Q network using (tt - Q(ss, aa))^2 as loss
17
18 s = s'
19 until terminated
In step 16 the value of Q(ss, aa) is used to calculate the loss. When is this Q value calculated? At the time the action was taken or during the training itself?
Since replay memory only stores < s,a,r,s' > and not the q-value, is it safe to assume the q value will be calculated during the time of training?

Yes, in step 16, when training the network, you are using the the loss function (tt - Q(ss, aa))^2 because you want to update network weights in order to approximate the most recent Q-values, computed as rr + γmaxa’Q(ss’, aa’) and used as target. Therefore, Q(ss, aa) is the current estimation, which is typically computed during training time.
Here you can find a Jupyter Notebook with a simply Deep Q-learning implementation that maybe is helpful.

Related

Questiins about DTCM, how to model this process?

In a certain manufacturing system, there are 2 machines M 1 and M 2.
M 1 is a fast and high precision machine whereas M 2 is a slow and low
precision machine. M 2 is employed only when M 1 is down, and it is
assumed that M 2 does not fail. Assume that the processing time of
parts on M 1, the processing time of parts on M 2, the time to failure
of M 1, and the repair time of M 1 are independent geometric random
variables with parameters p 1, p 2, f, and r, respectively. Identify a
suitable state space for the DTMC model of the above system and
compute the TPM. Investigate the steady-state behavior of the DTMC.
How to model this into. DTMC? What is the state space? I have tried to use state-space like this:
0: M1 is working, M1 does not fail
1: M1 failed, M2 is working, M1 is repairing, M1 does not finish repairing
But there are still some problems, like what will happen after M1 finishes 1 part? Immediately process the next one or will change decide whether to fail or not? What will happen if M1 fails during process one part? What is the probability transfer matrix?
Thank you very much for your help!!!!

Is there any case where the Bimodal will be better than Not take?

Considering these two methods:
Dynamic Bimodal:
Where we have 4 stages, 2 stages for each (taken or not taken), and alternating every time the algorithm predicts wrong, changing from taken<->not taken after 2 consecutive wrong predictions.
Static Not Taken:
Here the algorithm will always predict taken OR not taken. Swapping between the two stages after every single wrong prediction.
I tested both algorithm with the follow code in C:
for(i=0; i<4; i++) {
}
and analyzing the if conditional.
for(i=0; i<4; i++) {
if( i%2 ) {
}
else {
}
}
In both cases they are even (will predict right/wrong the same quantity of times).
Is there any possible simple algorithm where the Bimodal will be better than not taken ?
The Static Not Taken (SNT) predictor is almost always (much) worse than any other predictor. The main reasons for this is that it's terrible with predicting the control flow of loops because it will predict not taken at every iteration.
Let's assume that the first C loop will be compiled to something like this:
loop body
compute loop condition
branch to the loop body if condition met
So there is only one branch at the end. The SNT predictor will predict not taken 4 times, but the branch is taken 3 times. So the accuracy is 25%. On the other hand, a bimodal predictor with an initial state of 10 or 111 will achieve an accuracy of 75%. The initial states 01 and 00 will achieve accuracies of 50% and 25%, respectively. 10 or 11 are considered to be good initial states.
Let's assume that the second C loop will be compiled to something like this:
compute the if condition
branch to the else body if condition met
the if body
non-conditional branch to the end of the loop
the else body
compute loop condition
branch to the loop body if condition met
So there are two conditional branches. The SNT predictor will predict not taken 8 times, but 5 of which are mispredictions (there are 5 takens and 3 not-takens2). So the accuracy is 37%. For the bimodal predictor, let's assume that each branch uses the same counter. A bimodal predictor with initial states of 10 or 11 will achieve an accuracy of 63%. A bimodal predictor with initial states of 00 or 01 will achieve accuracies of 25% and 50%, respectively. If each branch uses a different counter with the same initial state, the calculations are similar.
[1] Where 00 and 01 represent not taken and 10 and 11 represent taken.
[2] T, T, NT, T, T, T, NT, NT.

How to compute & plot Equal Error Rate (EER) from FAR/FRR values using matlab

I have the following values against FAR/FRR. i want to compute EER rates and then plot in matlab.
FAR FRR
19.64 20
21.29 18.61
24.92 17.08
19.14 20.28
17.99 21.39
16.83 23.47
15.35 26.39
13.20 29.17
7.92 42.92
3.96 60.56
1.82 84.31
1.65 98.33
26.07 16.39
29.04 13.13
34.49 9.31
40.76 6.81
50.33 5.42
66.83 1.67
82.51 0.28
Is there any matlab function available to do this. can somebody explain this to me. Thanks.
Let me try to answer your question
1) For your data EER can be the mean/max/min of [19.64,20]
1.1) The idea of EER is try to measure the system performance against another system (the lower the better) by finding the equal(if not equal then at least nearly equal or have the min distance) between False Alarm Rate (FAR) and False Reject Rate (FRR, or missing rate) .
Refer to your data, [19.64,20] gives min distance, thus it could used as EER, you can take mean/max/min value of these two value, however since it means to compare between systems, thus make sure other system use the same method(mean/max/min) to pick EER value.
The difference among mean/max/min can be ignored if the there are large amount of data. In some speaker verification task, there will be 100k data sample.
2) To understand EER ,better compute it by yourself, here is how:
two things you need to know:
A) The system score for each test case (trial)
B) The true/false for each trial
After you have A and B, then you can create [trial, score,true/false] pairs then sort it by the score value, after that loop through the score, eg from min-> max. At each loop assume threshold is that score and compute the FAR,FRR. After loop through the score find the FAR,FRR with "equal" value.
For the code you can refer to my pyeer.py , in function processDataTable2
https://github.com/StevenLOL/Research_speech_speaker_verification_nist_sre2010/blob/master/SRE2010/sid/pyeer.py
This function is written for the NIST SRE 2010 evaluation.
4) There are other measures similar to EER, such as minDCF which only play with the weights of FAR and FRR. You can refer to "Performance Measure" of http://www.nist.gov/itl/iad/mig/sre10results.cfm
5) You can also refer to this package https://sites.google.com/site/bosaristoolkit/ and DETware_v2.1.tar.gz at http://www.itl.nist.gov/iad/mig/tools/ for computing and plotting EER in Matlab
Plotting in DETWare_v2.1
Pmiss=1:50;Pfa=50:-1:1;
Plot_DET(Pmiss/100.0,Pfa/100.0,'r')
FAR(t) and FRR(t) are parameterized by threshold, t. They are cumulative distributions, so they should be monotonic in t. Your data is not shown to be monotonic, so if it is indeed FAR and FRR, then the measurements were not made in order. But for the sake of clarity, we can order:
FAR FRR
1 1.65 98.33
2 1.82 84.31
3 3.96 60.56
4 7.92 42.92
5 13.2 29.17
6 15.35 26.39
7 16.83 23.47
8 17.99 21.39
9 19.14 20.28
10 19.64 20
11 21.29 18.61
12 24.92 17.08
13 26.07 16.39
14 29.04 13.13
15 34.49 9.31
16 40.76 6.81
17 50.33 5.42
18 66.83 1.67
19 82.51 0.28
This is for increasing FAR, which assumes a distance score; if you have a similarity score, then FAR would be sorted in decreasing order.
Loop over FAR until it is larger than FRR, which occurs at row 11. Then interpolate the cross over value between rows 10 and 11. This is your equal error rate.

Regarding Time scale issue in Netlogo

I am new user of netlogo. I have a system of reactions (converted to Ordinary Differential Equations), which can be solved using Matlab. I want to develop the same model in netlogo (for comparison with matlab results). I have the confusion regarding time/tick because netlogo uses "ticks" for increment in time, whereas Matlab uses time in seconds. How to convert my matlab sec to number of ticks? Can anyone help me in writing the code. The model is :
A + B ---> C (with rate constant k1 = 1e-6)
2A+ C ---> D (with rate constant k2 = 3e-7)
A + E ---> F (with rate constant k3 = 2e-5)
Initial values are A = B = C = 500, D = E = F = 10
Initial time t=0 sec and final time t=6 sec
I have a general comment first, NetLogo is intended for agent-based modelling. ABM has multiple entities with different characteristics interacting in some way. ABM is not really an appropriate methodology for solving ODEs. If your goal is to simply build your model in something other than Matlab for comparison rather than specifically requiring NetLogo, I can recommend Vensim as more appropriate. Having said that, you can build the model you want in NetLogo, it is just very awkward.
NetLogo handles time discretely rather than continuously. You can have any number of ticks per second (I would suggest 10 and then final time is 60 ticks). You will need to convert your equations into a discrete form, so your rates would be something like k1-discrete = k1 / 10. You may have precision problems with very small numbers.

Time series forecasting

I have an input and target series. However, the target series lags 3 steps behind the input. Can I still use narx or some other network?
http://www.mathworks.co.uk/help/toolbox/nnet/ref/narxnet.html
Predict: y(t+1)
Input:
x(t) |?
x(t-1)|?
x(t-2)|?
x(t-3)|y(t-3)
x(t-4)|y(t-4)
x(t-5)|y(t-5)
...
During my training, I have y(t-2), y(t-1), y(t) in advance, but when I do the prediction in real life, those values are only available 3 steps later, because I calculate y from the next 3 inputs.
Here are some options
1) Also, you could have two inputs and one output as
x(t), y(t-3) -> y(t)
x(t-1),y(t-4) -> y(t-1)
x(t-2),y(t-5) -> y(t-2)
...
and predict the single output y(t)
2) You could also use ar or arx with na = 0, nb > 0, and nk = 3.
3) Also, you could have four inputs, where 2 of the inputs are estimated and one output as
x(t), y(t-3), ye(t-2), ye(t-1) -> y(t)
x(t-1),y(t-4), y(t-3), ye(t-2) -> y(t-1)
x(t-2),y(t-5), y(t-4), y(t-3) -> y(t-2)
...
and predict the single output y(t), using line 3 and higher as training data
4) You could setup the input/output as in steps one or three and use s4sid
I have a similar problem, but without any measurable inputs. And I'm trying to see how much error there is as the forecast distance and model complexity are increased. But I've only tried approach 2 and set nb = 5 to 15 by 5 and varied nk from 20 to 150 by 10 and plotted contours of maximum error. In my case, I'm not interested in predictions of less than 20 time steps.
Define a window of your choice( you need to try different sizes to see which is the best value). Now make this problem a regression problem. Use values of xt and yt from t=T-2... T-x where x-2 is the size of window. Now use regress() to train a regression model and use it for prediction.