Why my PPO RL Agent action still passing the upper and lower limit in Matlab/Simulink? - matlab

I am using default Matlab rappresentation of PPO Agent, I want that one of my action only in range 0 - 1 and the other in range -1 - 1. I already set up my UpperLimit to 1, and LowerLimit to 0 or -1.
obsInfo = rlNumericSpec([17 1]);
actInfo = rlNumericSpec([5 1], ...
"UpperLimit", 1, ...
"LowerLimit",[0; -1; -1; -1; -1]);
env = rlSimulinkEnv("test_2", "test_2/agent",obsInfo, actInfo, "UseFastRestart", "off");
ppo_opt = rlPPOAgentOptions("SampleTime", 0.01);
agent_ppo = rlPPOAgent(obsInfo, actInfo, ppo_opt);
But as you can see, my action still can passing the limit (scope 1 should be in range 0 - 1 , scope 5 in range -1 - 1). This
happen for all action signal. If I change my Agent (e.g. with DDPG) this not happen. How can i fix that ?
Scope rapresenting two action dynamic of RL Agent

If ur actor is in continuous space, it will have two out puts. One is mean and second is standard devation. If standard devation path on ur actor gives posetiva value, ur actor will deviate randomly, the larger the standard devation the larger the devation. So try to make standard devation path of ur actor give zero value. Eg if sd=0.1, mean =0, then think as if actual out out is between -0.1 and 0.1.( it's just an example not exact calculation)

Related

Number of ODE Events changes with varying condition simple ODE

I have the following simple ODE:
dx/dt=-1
With initial condition x(0)=5, I am interested in when x(t)==1. So I have the following events function:
function [value,isterminal,direction] = test_events(t,x)
value = x-1;
isterminal = 0;
direction = 0;
end
This should produce an event at t=4. However, if I run the following code I get two events, one at t=4, and one at the nearby location t=4+5.7e-14:
options = odeset('Events',#test_events);
sol = ode45(#(t,x)-1,[0 10],5,options);
fprintf('%.16f\n',sol.xe)
% 4.0000000000000000
% 4.0000000000000568
If I run similar codes to find when x(t)==0 or x(t)==-1 (value = x; or value = x+1; respectively), I have only one event. Why does this generate two events?
UPDATE: If the options structure is changed to the following:
options = odeset('Events',#test_events,'RelTol',1e-4);
...then the ODE only returns one event at t=4+5.7e-14. If 'RelTol' is set to 1e-5, it returns one event at t=4. If 'RelTol' is set to 1e-8, it returns the same two events as the default ('RelTol'=1e-3). Additionally, changing the initial condition from x(0)=5 to x(0)=4 produces a single event, but setting x(0)=4 and 'RelTol'=1e-8 produces two events.
UPDATE 2: Observing the sol.x and sol.y outputs (t and x, respectively), the time progresses as integers [0 1 2 3 4 5 6 7...], and x progresses as integers up until x(t=5) like so: [5 4 3 2 1 1.11e-16 -1.000 -2.000...]. This indicates that there is something that occurs between t=4 and t=5 that creates a 'bump' in the ODE solution. Why?
One speculation that might explain how rounding errors could occur in this simple problem: The solution is interpolated between the internal steps using the evaluations k_n of the ODE derivatives function, also called "dense output". The theoretical form is
b_1(u)k_1 + b_2(u)k_2 + ...b_s(u)k_s
where 0 <= u<= 1 it the parameter over the interval between the internal points, that is, t = (1-u)*t_k+u*t_{k+1}.
The coefficient polynomials are non-trivial. While in the example all the k_i=1 are constant, the evaluation of the sum b_1(u)+...+b_s(u) can accumulate rounding errors that become visible in the solution value close to a root, even if y_k and y_{k+1} are exact. In that range of accumulated floating point noise, the value might oscillate around the root, leading to the detection of multiple zero crossings.

Would there be an issue, for system verilog functional coverage bins with similar sequences?

covergroup xxxx ;
yyyy : coverpoint (zzzz)
{
bins sequence_1 = {0=>1=>2=>3};
bins sequence_2 = {0=>1=>2=>3=>4};
bins sequence_3 = {0=>1=>2=>3=>4=>5=>6=>7=>8=>9};
bins sequence_4 = {0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15=>16=>17};
bins sequence_5 = {0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15=>16=>17=>18};
bins sequence_6 = {0=>1=>2=>3=>4=>5=>6=>7=>8=>9=>10=>11=>12=>13=>14=>15=>16=>17=>18=>19};
}
endgroup
zzzz is a counter register which counts from 0 up to 3,4,9,17,18 or 19 depending on its input
In coding this functional coverage, the idea is to hit either one of the bins if a specific series of transitions occur, only one bin.
so if the transitions for example goes from 0 to 4 like in sequence_2, would that also hit sequence_1, since the 0 to 3 sequence is present in sequence_1
Thanks
Yes, if sequence_2 is hit, that implies that sequence_1 is also hit. What you really want is to cover what happens when the counter reaches its limit. i.e. does it go back to 0 or does it stay at the limit in the next cycle? There is no need to elaborate every intermediate value of the counter - a covergroup is not a checker. It is only recording that a certain scenario in your test was achieved.

How do I use eps in this situation where I am taking differences of cumulative sums

First, the data:
orig = reshape([0.0000000000000000 0.3480000000000000 0.7570000000000000 1.3009999999999999 2.8300000000000001 4.7519999999999998 5.2660000000000000 5.8120000000000003 14.3360000000000000 15.3390000000000000 ],[10 1])
change = reshape([0.0000000000000000 0.3480000000000000 0.0000000000000000 0.9530000000000000 1.5290000000000001 1.9219999999999997 0.5140000000000002 0.5460000000000003 0.0000000000000000 9.5270000000000010 ],[10 1])
change = cumsum(change)
orig is a vector of seconds elapsed. change is a vector derived by taking differences between (some) elements of orig. The cumulative sum of change has some elements actually equal to the corresponding element in orig.
However, due to precision issues:
diff = orig - change
gives
diff =
0
0
0.409
0
0
0
0
0
8.524
-1.77635683940025e-15
It seems that if I run the following command:
diff(abs(diff) <= eps(orig)) = 0
then this sets entries which should be zero, but are not due to precision issues, to be zero.
My question is, is this the correct way to do it? Why is the comparison <= instead of <? Should the statement be:
diff(abs(diff) < k*eps(orig)) = 0
for some k > 1 to give some tolerance? If so, how would one pick k?
In case it is necessary to know how change is derived from orig, the following alternate example also shows this behaviour:
orig = reshape([0.0000000000000000 0.3480000000000000 0.7570000000000000 1.3009999999999999 2.8300000000000001 4.7519999999999998 5.2660000000000000 5.8120000000000003 14.3360000000000000 15.3390000000000000 ],[10 1])
change = orig - [0; orig(1:end-1)]
change = cumsum(change)
diff = orig - change
The following statement will be true only if the "almost zero" happens because 1 bit is offseted.
abs(diff) <= eps(orig)
1 bit is a ridiculously high precision to ask, a precision that most likely you can not achieve. Generally, you need to define your treshold yourself, such as
abs(diff) <= 1e-12
You also ask how to choose this value. Answer: there is no way we can tell you that. Its algorithm, application, unit, computer, [...] specific.
You are computing distance between particles? Maybe you need a smaller tolerance. You are doing economic profit calculus? 1e-12 is then a decimal you won't get in cash, for sure. Use 1e-4 instead. Or are you using an algorithm that does numerical approximations? Then you need a higher tolerance. How much tolerance you are OK with is, and will always be, a user choice.
Note: you need to be aware of the types you are using to set this minimum threshold right. MATLAB uses double as default, but if you are using other types, them this threshold is too strict. As an alternative, you can use
abs(diff) <= 100*eps(class(diff))
If your data type is not fixed/known.

MATLAB - Event Location fail to work?

The event location function fail in finding an easy event when solving a 4 variables system of ode. Here's the code:
options1 = odeset('RelTol',1e-5,'AbsTol',1e-9,'Events',#evento1);
[T_ode1,X_ode1,te]=ode15s(#Seno,[0 2],[0 0 0 0 0],options1);
function [y] = Seno(t,x)
%Parameters
V=20;
R=1e6;
epsilon=8.87e-12;
d=4.5e-5;
k_sp=10;
gamma1=0.04;
gamma2=0.1;
m=66e-3;
A=0.1;
omega=80;
h=3.8e-6;
l=2e-3;
N=142;
%Variable redefinition
%x=[x,xpunto,q,y,ypunto]
X=x(1);
Xp=x(2);
Q=x(3);
Y=x(4);
Yp=x(5);
%sistema eq differenziali
y(1)=Xp; %y1(1)=position1
y(2)=-(2*k_sp*X/m)-(gamma1*Xp/m)+((epsilon*2*d*h*N*l)*X*V^2/((d^2-X^2)^2))+A*sin(omega*t); %y1(2)=velocity1
y(3)=1/R*(V-Q*(d^2-X^2)/(epsilon*2*d*h*N*l)); %y1(3)=charge
y(4)=Yp; %position 2
y(5)=-gamma2*Yp; %velocity2
y=y';
end
function [condition,ends,directions] = evento1(t,y)
a=2e-6;
c=2e-6;
b=1.5e-6;
condition= [(y(1)^2)-(a+c)^2, (y(4)^2)-(y(1)+b)^2, (y(4)^2)-(y(1)-b)^2];
ends = [1, 1, 1]; % Halt integration
directions = [1, -1, 1];
end
Setting all the initial condition to 0, as you can see, the first event that the event function should find is when the tird condition when y(1) pass for 1.5e-6 (y(4) is 0). Unfortunally ode ignore that event and stop the solution when the 1st one is satisfied.
I can't see why this happen! I tryed the debugging mode and the systems properly pass across 1.5e-6 but doesn't consider it as an event (ie it doesn't start to evaluate the solution in more points near the event).
Thanks for your time and sorry for my english.
As I mentioned in the comments, you can plot your event conditions after simulating the system:
options1 = odeset('RelTol',1e-5,'AbsTol',1e-9,'Events',#evento1);
[T_ode1,X_ode1,te] = ode15s(#Seno,[0 2],[0 0 0 0 0],options1);
a = 2e-6;
c = 2e-6;
b = 1.5e-6;
y = X_ode1;
condition = [y(:,1).^2-(a+c)^2 y(:,4).^2-(y(:,1)+b).^2 y(:,4).^2-(y(:,1)-b).^2];
figure;
plot(T_ode1,condition);
hold on;
plot(T_ode1([1 end]),[0 0],'k--',te,0,'k*');
legend('Condition 1','Condition 2','Condition 3','Location','W');
xlabel('Time');
ylabel('State');
ylim([-1.8e-11 2e-12]);
This results in a plot that looks like this:
If you zoom in on the plot you'll see that the third condition (yellow) never crosses zero and thus never triggers an event. Eventually the first event (blue) crosses zero and does trigger an event. Adjusting the integration tolerances doesn't appear to change this behavior (at best the third condition might asymptotically touch, but not cross, zero). If you want the third condition to trigger an event, you'll either need to change your parameters, change the condition, change your initial conditions, or change your ODEs.
I'm not sure if or why you think the third condition should cross zero, but if the system is numerically sensitive, then you may need to compensate for this by specifying parameter more precisely or artificially biasing the point of zero crossing.

find consecutive nonzero values

I am trying to write a simple MATLAB program that will find the first chain (more than 70) of consecutive nonzero values and return the starting value of that consecutive chain.
I am working with movement data from a joystick and there are a few thousand rows of data with a mix of zeros and nonzero values before the actual trial begins (coming from subjects slightly moving the joystick before the trial actually started).
I need to get rid of these rows before I can start analyzing the movement from the trials.
I am sure this is a relatively simple thing to do so I was hoping someone could offer insight.
Thank you in advance
EDIT: Here's what I tried:
s = zeros(size(x1));
for i=2:length(x1)
if(x1(i-1) ~= 0)
s(i) = 1 + s(i-1);
end
end
display(S);
for a vector x1 which has a max chain of 72 but I dont know how to find the max chain and return its first value, so I know where to trim. I also really don't think this is the best strategy, since the max chain in my data will be tens of thousands of values.
This answer is generic for any chain size. It finds the longest chain in a vector x1 and retrieves the first element of that chain val.
First we'll use bwlabel to label connected components, For example:
s=bwlabel(x1);
Then we can use tabulate to get a frequency table of s, and find the first element of the biggest connected component:
t=tabulate(s);
[C,I]=max(t(:,2));
val=x1(find(s==t(I,1),1, 'first'));
This should work for the case you have one distinct maximal size chain. What happens for the case if you have more than one chain that has maximal lengths? (you can still use my code with slight modifications...)
You don't need to use an auxiliary vector to keep track of the index:
for i = 1:length(x)
if x(i) ~= 0
count = count + 1;
elseif count >= 70
lastIndex = i;
break;
else
count = 0;
end
if count == 70
index = i - 69;
end
end
To remove all of the elements in the chain from x, you can simply do:
x = x([lastIndex + 1:end]);
EDIT (based off comment):
The reason that the way you did it didn't work was because you didn't reset the counter when you ran into a 0, that's what the:
else
count = 0;
is for; it resets the process, if you will.
For some more clarity, in your original code, this would be reflected by:
if x1(i-1) ~= 0
s(i) = 1 + s(i-1);
else
s(i) = 0;
end