I'm trying to create a switch statement as below, which works well until something crosses a page. The switch destination is auto generated, which is why its in another file. 'structure, x' holds the offset (the case switch). In the case below, it will be either $00, $02, $04 or $06.
Is there anyway to ensure that the returnAddr isn't at $xx00? (Does that actually matter here?) And that the switchlist doesn't cross a boundary?
lda #>returnAddr
pha
lda #<returnAddr-1
pha
; store where we want to go
lda switchlist+1
pha
lda switchlist
clc
adc structure, x
pha
rts ; make call to the proc in the switchlist
returnAddr:
; ...
rts
and in another file I have (where case_x are function labels)
switchlist:
.word case_1
.word case_2
.word case_3
.word case_4
I was always taught to structure it like this:
LDA #2 ;say we want case_3
ASL
TAX
JSR handleSwitch ;pushes the return address for us.
;return here after we goto the desired case.
handleSwitch:
LDA switchlist+1,x
pha
LDA switchlist
pha
rts ;"return" to desired case
case_1:
rts ;each of these returns to after "jsr handleSwitch"
case_2:
rts
case_3:
rts
case_4:
rts
The key here is that you JSR to your trampoline, inlining it won't have the desired effect. If you build it this way it doesn't matter if your return address crosses a page boundary.
Related
I'm new to reinforcement learning at all, so I may be wrong.
My questions are:
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
I've already tried to find what I want and I've seen many tutorials, but almost everyone doesn't show the background, just implements it using Python library like Keras.
Thanks in advance and I apologise if something sounds dumb, I just don't get that.
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Yes, generally that equation is only used to define our losses. More specifically, it is rearranged a bit; that equation is what we expect to hold, but it generally does not yet precisely hold during training. We subtract the right-hand side from the left-hand side to compute a (temporal-difference) error, and that error is used in the loss function.
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
Indeed the space of state-action pairs is much too large to enumerate them all in a matrix/table. In other words, we can't use Tabular RL. This is precisely why we use a Neural Network in DQN though. You can view Q(s, a) as a function. In the tabular case, Q(s, a) is simply a function that uses s and a to index into a table/matrix of values.
In the case of DQN and other Deep RL approaches, we use a Neural Network to approximate such a "function". We use s (and potentially a, though not really in the case of DQN) to create features based on that state (and action). In the case of DQN and Atari games, we simply take a stack of raw images/pixels as features. These are then used as inputs for the Neural Network. At the other end of the NN, DQN provides Q-values as outputs. In the case of DQN, multiple outputs are provided; one for every action a. So, in conclusion, when you read Q(s, a) you should think "the output corresponding to a when we plug the features/images/pixels of s as inputs into our network".
Further question from comments:
I think I still don't get the idea... Let's say we did one iteration through the network with state S and we got following output [A = 0.8, B = 0.1, C = 0.1] (where A, B and C are possible actions). We also got a reward R = 1 and set the y (a.k.a. gamma) to 0.95 . Now, how can we put these variables into the loss function formula https://imgur.com/a/2wTj7Yn? I don't understand what's the prediction if the DQN outputs which action to take? Also, what's the target Q? Could you post the formula with placed variables, please?
First a small correction: DQN does not output which action to take. Given inputs (a state s), it provides one output value per action a, which can be interpreted as an estimate of the Q(s, a) value for the input state s and the action a corresponding to that particular output. These values are typically used afterwards to determine which action to take (for example by selecting the action corresponding to the maximum Q value), so in some sense the action can be derived from the outputs of DQN, but DQN does not directly provide actions to take as outputs.
Anyway, let's consider the example situation. The loss function from the image is:
loss = (r + gamma max_a' Q-hat(s', a') - Q(s, a))^2
Note that there's a small mistake in the image, it has the old state s in the Q-hat instead of the new state s'. s' in there is correct.
In this formula:
r is the observed reward
gamma is (typically) a constant value
Q(s, a) is one of the output values from our Neural Network that we get when we provide it with s as input. Specifically, it is the output value corresponding to the action a that we have executed. So, in your example, if we chose to execute action A in state s, we have Q(s, A) = 0.8.
s' is the state we happen to end up in after having executed action a in state s.
Q-hat(s', a') (which we compute once for every possible subsequent action a') is, again, one of the output values from our Neural Network. This time, it's a value we get when we provide s' as input (instead of s), and again it will be the output value corresponding to action a'.
The Q-hat instead of Q there is because, in DQN, we typically actually use two different Neural Networks. Q-values are computed using the same Neural Network that we also modify by training. Q-hat-values are computed using a different "Target Network". This Target Network is typically a "slower-moving" version of the first network. It is constructed by occasionally (e.g. once every 10K steps) copying the other Network, and leaving its weights frozen in between those copy operations.
Firstly, the Q function is used both in the loss function and for the policy. Actual output of your Q function and the 'ideal' one is used to calculate a loss. Taking the highest value of the output of the Q function for all possible actions in a state is your policy.
Secondly, no, it's not recurrent. The equation is actually slightly different to what you have posted (perhaps a mathematician can correct me on this). It is actually Q(s, a) := r + y * max(Q(s', a')). Note the colon before the equals sign. This is called the assignment operator and means that we update the left side of the equation so that it is equal to the right side once (not recurrently). You can think of it as being the same as the assignment operator in most programming languages (x = x + 1 doesn't cause any problems).
The Q values will propagate through the network as you keep performing updates anyway, but it can take a while.
I try to do a 10 folds cross validation without using built-in function to train and recognize digit from 0-9 I have sample of 500 picture(50 for each digit to train and test.)
I try implement the answer MATLAB: 10 fold cross Validation without using existing functions and other websites but it didn't help that much. Mostly because I'm new to MATLAB so I don't know much about what I should do to tweak it.
This is the code I have so far.
c=zeros(10,size(x,2),size(x,3));
K=10;
k=10;
test= 1:50/K;
for fold =1:K
if(test(1)~=1)
train = x(1:test(1)-1,:,:);
if (test(5) ~=50)
train=[train ; x(test(end):50,:,:)];
end
else
train = x(test(1):50,:,:);
end
test = test+ones(1,50/K)*50/K;
end
for i =0:9
test=test+50/K*ones(1,5);
c(i+1,:,:)=cal_likelihood(x(1+i*50:50+i*50,:,:),50/k*(k-1));
end
Variable explanation
x is the 500x28x28 double where it keeps all 500 digit picture.
test is a test set.
train is a training set.
In order to do 10 folds cross validation I need to change training set like
1st fold : 1:5 for test,6:45 for train
2nd fold : 6:10 for test,1:5 11:50 for train and so on
The problem is I don't know how to shift the training set from one set to another like from 6:45 to 1:5 and 11:50. or Can I write a better loop than this?
PSS. If someone who answer this don't mind What does 500x28x28 double actually mean.
There are a few ways you could write this, some of which are easier to understand than others. Matlab is quite nice to write in as while expressions such as 1:3 evaluate to [1,2,3], the expression 1:0 evaluates to the empty set. So, it is very straightforward to generate the sets without having to use if statements.
I'd start off the loop as:
samples_per_digit=50;
block_sze=samples_per_digit/K;
for fold =1:K
test_ind = 1+(fold-1)*block_sze:fold*block_sze;
train_ind = [1:(fold-1)*block_sze, (fold*block_sze+1):samples_per_digit];
for i=0:9
train=x(train_ind+i*samples_per_digit,:,:);
test=x(test_ind+i*samples_per_digit,:,:);
% Perform training and validation in here for this fold of the digit i
You can verify that test_ind and train_ind correspond to the subsets of blocks of training and validation that you need. It is only in the innermost loop that these translate to the matrices corresponding to the digit images, using the value of i to compute the offset. Of course, if you wish, you can swap the order of the loops, computing all of the folds for a single digit. It all depends on how you wish to store your results.
I have one "Thermal Mass" block in Simulink, which represents a thermal mass, which is the ability of a material or combination of materials to store internal energy. In this standard block of Simulink, the initial temperature must be entered. Only one signal can be connected to the block. The source code of the block looks like following:
component mass
% Thermal Mass
% The block represents a thermal mass, which is the ability of a material
% or combination of materials to store internal energy. The property is
% characterized by mass of the material and its specific heat.
%
% The block has one thermal conserving port.
% The block positive direction is from its port towards the block. This
% means that the heat flow is positive if it flows into the block.
% Copyright 2005-2013 The MathWorks, Inc.
nodes
M = foundation.thermal.thermal; % :top
end
parameters
mass = { 1, 'kg' }; % Mass
sp_heat = { 447, 'J/(kg*K)' }; % Specific heat
end
variables
Q = { 0, 'J/s' }; % Heat flow
end
variables(Conversion=absolute)
T = { 300, 'K' }; % Temperature
end
function setup
% Parameter range checking
if mass <= 0
pm_error('simscape:GreaterThanZero','Mass')
end
if sp_heat <= 0
pm_error('simscape:GreaterThanZero','Specific heat')
end
end
branches
Q : M.Q -> *;
end
equations
T == M.T;
Q == mass * sp_heat * T.der;
assert(T>0, 'Temperature must be greater than absolute zero')
end
end
I would like to build another component, whose initial temperature can come from another block, so that it can be also calculated somewhere else. So, one input parameter and everything else should be the same. I am new to Simulink and don't know much about the domains. Any idea, how this can be done?
Thank you!
Parameters entered on a Simulink block are usually utilized for initial values and tuning of block behavior. While newer versions of Simulink will allow you to tune some parameters during simulation, others will be locked down and un-modifiable. This may mean that you need to first execute a model to calculate the initial value for your Thermal Mass, and then start up a second simulation using that temperature as an initial value.
I believe the Simulink help on how to control block parameters will be useful. Depending on the specific design of your model, different techniques found here may be more or less applicable, but generally speaking I know of 2 easy and simple ways to accomplish modifying a mask value.
Set the value to a variable in your Matlab base workspace.
Place the block inside a Masked subsystem. The mask can be used to define a variable that accessible to all the blocks inside it.
This is not possible, while you can execute some pre-processing to determine initial temperature you can not have this as an input from other blocks.
The workaround described by Jared is probably what you're looking for.
It's actually pretty rare to need to do this, if you tell us why you'r looking to set this up, we may be able to help.
Objective
Currently I am trying to create an uncertain system based on a family of statespace models using ucover. For this I am basing my script on the documentation "Modeling a Family of Responses as an Uncertain System" which shows the technique for creating an uncertain system based on a single-input-single-output system (SISO) explicitly but makes it clear that this is fully useable for MIMO systems as well.
Technical details
Specifically it is stated with the documentation of ucover that it supports MIMO systems:
USYS = ucover(PARRAY,PNOM,ORD1,ORD2,UTYPE) returns an uncertain
system USYS with nominal value PNOM and whose range of behaviors
includes all LTI responses in the LTI array PARRAY. PNOM and PARRAY
can be SS, TF, ZPK, or FRD models. USYS is of class UFRD if PNOM
is an FRD model and of class USS otherwise.
ORD1 and ORD2 specify the order (number of states) of each diagonal
entry of W1 and W2. If PNOM has NU inputs and NY outputs, ORD1 and ORD2
should be vectors of length:
UTYPE ORD1 ORD2
InputMult NU-by-1 NU-by-1
OutputMult NY-by-1 NY-by-1
Additive NY-by-1 NU-by-1
In my case I am using both 2 inputs and 2 outputs so both ORD1 adn ORD2 should be 2 by 1. I am using 8 as the number of states used by W1 and W2 (just because, I will try adjusting that once this issue is sorted).
The Attempt
Based on the SISO example I have attempted to create a MIMO example, this is shown below
noInputs=2;
noOutputs=2;
noOfStates=4;
Anom=rand(noOfStates,noOfStates);
Bnom=rand(noOfStates,noInputs);
Cnom=rand(noOutputs,noOfStates);
Dnom=rand(noOutputs,noInputs);
Pnom=ss(Anom, Bnom, Cnom, Dnom);
p1 = Pnom*tf(1,[.06 1]); % extra lag
p2 = Pnom*tf([-.02 1],[.02 1]); % time delay
p3 = Pnom*tf(50^2,[1 2*.1*50 50^2]);
Parray = stack(1,p1,p2,p3);
Parrayg = frd(Parray,logspace(-1,3,60));
[P,Info] = ucover(Parrayg,Pnom,[8 8]',[8 8]','InputMult');
Wt = Info.W1;
bodemag((Pnom-Parray)/Pnom,'b--',Wt,'r'); grid
title('Relative Gaps vs. Magnitude of Wt')
The problem
Unlike the image in the documentation my uncertain model (when put through a bode plot) only shows a response on the lead diagonal. See the screenshot for what I mean:
Where blue is the individual models and red is the uncertain model
Question
How can I create an uncertain system based on a family of MIMO statespace models that correctly covers responses between all inputs and outputs?
If you use [8,8]' as your uncertainty order structure ord1,ord2, matlab will try to have two diagonal blocks in your uncertainty block each.
However matlab only supports diagonal weighting functions (due to some complications about nonconvex search) and what you are plotting is the diagonal weighting that will multiply the 2x2 full block LTI dynamic uncertainty. W1 affects the rows and W2 affects the columns of the uncertainty.
Hence you should check the samples of that uncertainty multiplied by the weights and then the plant. Then you can compare it with the uncertain model stack. Notice that your off-diagonal entries are practically zero (<1e-10) hence almost decoupled. But W1, W2 search looks for the H-infinity norm hence you don't get to see perfect covering at each block of the Bode plot. It combines the rows/columns of the required minimum uncertainty amount (see the examples on the help file). That's why you see 1 plot per each weight in the demos.
If you would like to model the each uncertainty affecting each block separately then you need to form a new augmented LFT such that the uncertainty is four 1x1(scalar) LTI dynamic uncertainty on the diagonal then you can have four entries in ord1 and ord2.
Since this is a MIMO system, you shouldn't compare things element-by-element. You are using the input-multiplicative form, so the uncertain system being created is of the form
Pnom*(I + W1*Delta*W2), where Delta is any stable (2-by-2, in this case) system, with ||Delta|| <= 1. So, to verify that the produced uncertain model "covers" your array of system, you should think of the equation
Parray = Pnom*(I + W1*Delta*W2)
and solve for Delta. Plot it (with SIGMA, say), and you will see that it is less than 1 in magnitude, for all frequencies. The Matlab code would be (multiply everything listed below, in order - my mulitplication symbol is not showing up in the posted answer...)
sigma(inv(W1)*inv(Pnom)*(Parrayg-Pnom)*inv(W2))
Now, using the syntax you specified, you are using weights W1 and W2 of the following form:
W1 = [W1_11 0;
0 W1_22]
and
W2 = [W2_11 0;
0 W2_22]
where you've specified 8th-order fits for all nonzero entries. Certainly for your example, this is overkill (although on a richer problem, it might be fine).
I would try much simpler, like
ucover(Parrag,Pnom,3,[],'InputMult')
That syntax will make an uncertain model of the form
Pnom*(I + w1*Delta)
where w1 is a scalar, 3rd order system. You could still see the covering by plotting SIGMA(Delta), namely
sigma((1/w1)*inv(Pnom)*(Parrayg-Pnom))
I hope that helps.
In order to create discrete or continuous time uncertain systems you can use uss associated with ureal.
Quick example
Define an uncertain propeller radius
% Propeller radius (m)
rp = ureal('rp',13.4e-2,'Range',[0.08 0.16]);
Define uncertain continuous time system
tenzo_unc = uss(A,Bw,Clocal,D,'statename',states,'inputname',inputs,'outputname',outputsLocal);
Simulate step response:
N = 5;
% Prende alcuni campioni del sistema incerto e calcola bound su incertezze
for i=1:1:N
sys{i} = usample(tenzo_unc);
step(sys{i})
hold on
cprintf('text','.');
end
Complete example
Quadcopter uncertain linearized model control with LQR. Code is available here
Step response
Closed Loop Step response
<script src="https://gist.github.com/GiovanniBalestrieri/f90a20780eb2496e730c8b74cf49dd0f.js"></script>
NB:
If you don't have the utility cprintf, include this script in your folder and use it.
I want implement a recommender system by Matlab, I choose MoveiLens Dataset , and svm algorithm.
I implement a function that returns two sets of Items. The first set is items that user Rate them more than 3 and second set is items that user Rate less than 4.
Rate is a matrix with 3 column, the first one is user Id,second one is Item Id, and third one is User Rate to this item .
function [like,dislike]=UsetLike(User,Rate)
k1=1;
k2=1;
for i=1:size(Rate,1)
if(Rate(i,1)==User)
if(Rate(i,3)>3)
like(k1)= Rate(i,2);
k1=k1+1;
end
if(Rate(i,3)<=3)
dislike(k2)= Rate(i,2);
k2=k2+1;
end
end
end
end
Then I write another function that train by svm such as this.
Feature is a matrix that in (i) row shows item (i) features, that has 19 features with 0 or 1 value.
function [svmModel]=TrainSVM(like,dislike,Feture)
group1=zeros(size(like,2),19);
group2=zeros(size(dislike,2),19);
for i=1: size(group1,1)
group1(i,:)= Feture(like(i),:);
end
for i=1: size(group2,1)
group2(i,:)= Feture(dislike(i),:);
end
Dataset=cat(1,group1,group2);
group=[repmat({'like'},1,size(group1,1)) repmat({'dislike'},1,size(group2,1) )]';
svmModel = svmtrain(Dataset, group, ...
'Autoscale',true, 'Showplot',false, 'Method','QP', ...
'BoxConstraint',2e-1, 'Kernel_Function','rbf', 'RBF_Sigma',1);
end
Now I want to know if my solution is right? As it was in mostly 3D space I can not precept it. What if I want to use Rate (1-5) instead of like or dislike?
In order to use the Rates (1-5) as an output signal you will need a modification called ranking SVM, as this is not just a multilabel classification, but rather a ranking problem.