Valley-filling optimization in MATLAB - matlab

i'm trying to simulate a distributed scheduler for ev (electric vehicle) charging. the desired behavior would shave peaks and fill valley, in other words minimize the peak-to-average ratio of the electric load on a test grid. ( such as in this paper: web.eecs.umich.edu/~hiskens/publications/1940.pdf )
Picture of wanted result:
So, basically, what I'm doing is this:
send to each user the aggregate load of the rest of the grid
initialize the charge scheduling vector randomly, this will be the starting point x0 of the IPM optimization
optimize: i want to minimize the peak-to-average ratio, or equivalently, the cost (quadratic fcn) of the supplied energy
The constraints are the following:
The sum of the charges scheduled between now and the departure moment of the vehicle is equal to the energy requirement (first line of Aeq)
Past scheduling cannot be changed (so it's equal to past load, ecs.ev_load ), scheduling post-departure must be zero ( I , the rest of Aeq matrix)
lower bound of energy consumption is zero
upper bound is a given parameter, ecs.max_load
x0=ecs.schedule;
I=eye(ecs.timeslots);
I(ecs.currenttime:ecs.ev.departure,:)=[];
Aeq=[zeros(1,ecs.currenttime-1) -1*ones(1,ecs.ev.departure-ecs.currenttime+1) zeros(1,ecs.timeslots-ecs.ev.departure);I];
beq=[-1*ecs.ev.requested_chg*ecs.nslotsperhour;ecs.ev_load;zeros(ecs.timeslots-ecs.ev.departure,1)];
A=[];
b=[];
nonlcon=[];
lb=zeros(ecs.timeslots,1);
ub=ecs.max_load*ones(ecs.timeslots,1);
options = optimoptions('fmincon','Algorithm','interior-point','MaxFunEvals',30000);
The objective function par at each ev sums
x: the scheduled load of the vehicle (the optimization variable)
other_loads: the future behavior of every non-ev load (unrealistic), the loads of the rest of ev schedules
other_loads=ecs.game_l; %l_n (all schedules in the grid, mine included)
other_loads(:,ecs.id)=[]; %l_-n (removed my own schedule)
schedule = fmincon(#par,x0,A,b,Aeq,beq,lb,ub,nonlcon,options);
function f=par(x)
tot=sum([x other_loads],2);
f=max(tot)/mean(tot);
end
Picture of what I'm getting:
Here's my problem: the optimization doesn't work. I keep tinkering with the code, but I always get (almost) the same result, where the valleys aren't filled at all, and the behaviour is almost constant. I verified by debugging that the content of the variables is correct, maybe I'm doing something wrong with the optimizer?
Thanks in advance

Related

How Double QN works?

What is the idea behind double QN?
The Bellman equation used to calculate the Q values to update the online network follows the equation:
value = reward + discount_factor * target_network.predict(next_state)[argmax(online_network.predict(next_state))]
The Bellman equation used to calculate the Q value updates in the original DQN is:
value = reward + discount_factor * max(target_network.predict(next_state))
but the target network for evaluating the action is updated using weights of the
online_network and the value and fed to the target value is basically old q value of the action.
any ideas how adding another networks based on weights from the first network helps?
I really liked the explanation from here:
https://becominghuman.ai/beat-atari-with-deep-reinforcement-learning-part-2-dqn-improvements-d3563f665a2c
"This is actually quite simple: you probably remember from the previous post that we were trying to optimize the Q function defined as follows:
Q(s, a) = r + γ maxₐ’(Q(s’, a’))
Because this definition is recursive (the Q value depends on other Q values), in Q-learning we end up training a network to predict its own output, as we pointed out last time.
The problem of course is that at each minibatch of training, we are changing both Q(s, a) and Q(s’, a’), in other words, we are getting closer to our target but also moving our target! This can make it a lot harder for our network to converge.
It thus seems like we should instead use a fixed target so as to avoid this problem of the network “chasing its own tail”, but of course that isn’t possible since the target Q function should get better and better as we train."

Q-Learning equation in Deep Q Network

I'm new to reinforcement learning at all, so I may be wrong.
My questions are:
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
I've already tried to find what I want and I've seen many tutorials, but almost everyone doesn't show the background, just implements it using Python library like Keras.
Thanks in advance and I apologise if something sounds dumb, I just don't get that.
Is the Q-Learning equation ( Q(s, a) = r + y * max(Q(s', a')) ) used in DQN only for computing a loss function?
Yes, generally that equation is only used to define our losses. More specifically, it is rearranged a bit; that equation is what we expect to hold, but it generally does not yet precisely hold during training. We subtract the right-hand side from the left-hand side to compute a (temporal-difference) error, and that error is used in the loss function.
Is the equation recurrent? Assume I use DQN for, say, playing Atari Breakout, the number of possible states is very large (assuming the state is single game's frame), so it's not efficient to create a matrix of all the Q-Values. The equation should update the Q-Value of given [state, action] pair, so what will it do in case of DQN? Will it call itself recursively? If it will, the quation can't be calculated, because the recurrention won't ever stop.
Indeed the space of state-action pairs is much too large to enumerate them all in a matrix/table. In other words, we can't use Tabular RL. This is precisely why we use a Neural Network in DQN though. You can view Q(s, a) as a function. In the tabular case, Q(s, a) is simply a function that uses s and a to index into a table/matrix of values.
In the case of DQN and other Deep RL approaches, we use a Neural Network to approximate such a "function". We use s (and potentially a, though not really in the case of DQN) to create features based on that state (and action). In the case of DQN and Atari games, we simply take a stack of raw images/pixels as features. These are then used as inputs for the Neural Network. At the other end of the NN, DQN provides Q-values as outputs. In the case of DQN, multiple outputs are provided; one for every action a. So, in conclusion, when you read Q(s, a) you should think "the output corresponding to a when we plug the features/images/pixels of s as inputs into our network".
Further question from comments:
I think I still don't get the idea... Let's say we did one iteration through the network with state S and we got following output [A = 0.8, B = 0.1, C = 0.1] (where A, B and C are possible actions). We also got a reward R = 1 and set the y (a.k.a. gamma) to 0.95 . Now, how can we put these variables into the loss function formula https://imgur.com/a/2wTj7Yn? I don't understand what's the prediction if the DQN outputs which action to take? Also, what's the target Q? Could you post the formula with placed variables, please?
First a small correction: DQN does not output which action to take. Given inputs (a state s), it provides one output value per action a, which can be interpreted as an estimate of the Q(s, a) value for the input state s and the action a corresponding to that particular output. These values are typically used afterwards to determine which action to take (for example by selecting the action corresponding to the maximum Q value), so in some sense the action can be derived from the outputs of DQN, but DQN does not directly provide actions to take as outputs.
Anyway, let's consider the example situation. The loss function from the image is:
loss = (r + gamma max_a' Q-hat(s', a') - Q(s, a))^2
Note that there's a small mistake in the image, it has the old state s in the Q-hat instead of the new state s'. s' in there is correct.
In this formula:
r is the observed reward
gamma is (typically) a constant value
Q(s, a) is one of the output values from our Neural Network that we get when we provide it with s as input. Specifically, it is the output value corresponding to the action a that we have executed. So, in your example, if we chose to execute action A in state s, we have Q(s, A) = 0.8.
s' is the state we happen to end up in after having executed action a in state s.
Q-hat(s', a') (which we compute once for every possible subsequent action a') is, again, one of the output values from our Neural Network. This time, it's a value we get when we provide s' as input (instead of s), and again it will be the output value corresponding to action a'.
The Q-hat instead of Q there is because, in DQN, we typically actually use two different Neural Networks. Q-values are computed using the same Neural Network that we also modify by training. Q-hat-values are computed using a different "Target Network". This Target Network is typically a "slower-moving" version of the first network. It is constructed by occasionally (e.g. once every 10K steps) copying the other Network, and leaving its weights frozen in between those copy operations.
Firstly, the Q function is used both in the loss function and for the policy. Actual output of your Q function and the 'ideal' one is used to calculate a loss. Taking the highest value of the output of the Q function for all possible actions in a state is your policy.
Secondly, no, it's not recurrent. The equation is actually slightly different to what you have posted (perhaps a mathematician can correct me on this). It is actually Q(s, a) := r + y * max(Q(s', a')). Note the colon before the equals sign. This is called the assignment operator and means that we update the left side of the equation so that it is equal to the right side once (not recurrently). You can think of it as being the same as the assignment operator in most programming languages (x = x + 1 doesn't cause any problems).
The Q values will propagate through the network as you keep performing updates anyway, but it can take a while.

Local volatility model Matlab

I am trying to do a Monte Carlo simulation of a local volatility model, i.e.
dSt = sigma(St,t) * St dWt .
Unfortunately the Matlab package class sde can not be applied, as the function is rather complex.
For this reason I am simulating this SDE manually with the Euler-Mayurama method. More specifically I used Ito's formula to get an SDE for the log-process Xt=log(St)
dXt = -1/2 sigma^2(exp(Xt),t) dt + sigma(exp(Xt),t) dWt
The code for this is the following:
function [S]=geom_bb(sigma,T,N,m)
% T.. Time horizon, sigma.. standard deviation, N.. timesteps, m.. dimensions
X=zeros(N+1,m);
dt=T/N;
t=(0:N)'*dt;
dW=randn(N,m);
for j=1:N
X(j+1,:)=X(j,:) - 1/2* sigma(exp(X(j,:)),t(j))^2 * sqrt(dt) + sigma(exp(X(j,:)),t(j))*dW(j,:);
end
S=exp(X*sqrt(dt));
end
This code works rather good for small sigma, however for sigma around 10 the process S always tends to zero. This should not happen as S is a martingale, and therefore has expectation =1 (at least for constant sigma).
However X should be simulated correctly, as the mean is exact.
Can anyone help me with this issue? Is this only due to numerical rounding errors? Is there another simulation method that should be preferred to solve this problem?
First are you sure S=exp(X*sqrt(dt)) outside the loop is doing what you want ? Why not have it inside the loop to start with ? You're using the exp(X) for sigma() inside the loop in any case, which is now missing the sqrt(dt).
Beyond that, suggested ways to improve behavior: use the Milstein scheme instead, increase the number of timesteps, make sure your sigma() value is commensurate with your timestep. Sigma of 10 means 1000% volatility, i.e. moves of 60% per day. Assuming dt is more than a few minutes, this simply can't be good.

Design optimization of material cost of water tank

One way to improve engineering designs is by formulating the equations describing the design in the form of a minimization or maximization problem. This approach is called design optimization. Examples of quantities to be minimized are energy consumption and construction materials. Items to be maximized are useful life and capacity such as the vehicle weight that can be supported by a bridge. In this project, we consider the problem of minimizing the material cost associated with building a water tank. The water tank consists of a cylindrical part of radius r and height h, and a hemispherical top. The tank is to be constructed to hold 500 meter cubed when filled. The surface area of the cylindrical part is 2*pi*rh, and its volume is pi*r^2. The surface area of the hemispherical top is given by 2*pi*r^2, and is volume is given by 2*pi*r^3/3. The cost to construct the cylindrical part of the tank is $300 per square meter of surface area;the hemispherical part costs $400 per square meter. Use the fminbnd function to compute the radius that results in the least cost. Compute the corresponding height h.
I got the right answer but it is very chaotic. I created bunch of function. I wonder if I can created one function?... let's name it ONEFUN
function R = findR(x)
h = (1500-2.*pi*x.^3)./(3.*pi.*x.^2);
R = 2.*pi.*x.*(h) + 2.*pi.*x.^2+pi.*x.^2;
function H = findH(x)
H = (1500-2.*pi*x.^3)./(3.*pi.*x.^2);
function [Cc, Chs, Tc] = Costs(r,h) % Cc - Cost of Cylinder, Chs - Cost of Hemishpere,
%Tc - Total Cost
Cc = ((2.*pi.*r.*h) + (pi.*r.^2)).*300;
Chs = (2.*pi.*r.^2).*400;
Tc = Cc+Chc;
I thought of using switch, response but I have no idea how to do it.
function Anwsers
response = input('Type "find r", "find h", "costHS", "costC", "total": ','s');
response = lower(response);
switch response
case 'find r'
Radius = fminsearch(#ONEFUN, [1]);
case 'find h'
Hight = findH(r)
case 'costHS'
case 'costC'
case 'total'
otherwise
disp('You have not entered a proper choice.')
end
I would appreciate and help
Doing it in one function is a bad idea. Lot's of simple function that do one thing each is good.
Most of the chaos from my point of view seems to be terse names, magic numbers, relying on operator precedence and duplication.
h = (1500- (2.*pi*x.^3)./(3.*pi.*x.^2)); for instance, I think ...
Why aren't you using the function of the same name? same code twice.
Where in Cthulhu's name do the numbers 1500, 300 and 400 come from?
Never been keen on single character function names myself, but that might be my lack of familiarity with expressing a problem mathematically.
This is a typical problem of minimizing a function with constraints. That is, you want to minimize the Cost(R,H), while keeping the Volume(R,H) fixed, and you have a simple (two-variable) equation for each of these.
For this you could use the matlab function fmincon.
The above is the most direct computational approach, but there are other ways to solve it using various degrees of incorporating the constraint into the solution analytically. You could, for example, do a full analytic solution, or solve the Volume equation for H, and then put this into to the Cost equation (ie, Cost(R,H)->Cost(R)) and then just minimize over R, etc. The approach you used is within this partially analytic middle-ground, but it's a bit messier for it.

Matlab: poor accuracy of optimizers/solvers

I am having difficulty achieving sufficient accuracy in a root-finding problem on Matlab. I have a function, Lik(k), and want to find the value of k where Lik(k)=L0. Basically, the problem is that various built-in Matlab solvers (fzero, fminbnd, fmincon) are not getting as close to the solution as I would like or expect.
Lik() is a user-defined function which involves extensive coding to compute a numerical inverse Laplace transform, etc., and I therefore do not include the full code. However, I have used this function extensively and it appears to work properly. Lik() actually takes several input parameters, but for the current step, all of these are fixed except k. So it is really a one-dimensional root-finding problem.
I want to find the value of k >= 165.95 for which Lik(k)-L0 = 0. Lik(165.95) is less than L0 and I expect Lik(k) to increase monotonically from here. In fact, I can evaluate Lik(k)-L0 in the range of interest and it appears to smoothly cross zero: e.g. Lik(165.95)-L0 = -0.7465, ..., Lik(170.5)-L0 = -0.1594, Lik(171)-L0 = -0.0344, Lik(171.5)-L0 = 0.1015, ... Lik(173)-L0 = 0.5730, ..., Lik(200)-L0 = 19.80. So it appears that the function is behaving nicely.
However, I have tried to "automatically" find the root with several different methods and the accuracy is not as good as I would expect...
Using fzero(#(k) Lik(k)-L0): If constrained to the interval (165.95,173), fzero returns k=170.96 with Lik(k)-L0=-0.045. Okay, although not great. And for practical purposes, I would not know such a precise upper bound without a lot of manual trial and error. If I use the interval (165.95,200), fzero returns k=167.19 where Lik(k)-L0 = -0.65, which is rather poor. I have been running these tests with Display set to iter so I can see what's going on, and it appears that fzero hits 167.19 on the 4th iteration and then stays there on the 5th iteration, meaning that the change in k from one iteration to the next is less than TolX (set to 0.001) and thus the procedure ends. The exit flag indicates that it successfully converged to a solution.
I also tried minimizing abs(Lik(k)-L0) using fminbnd (giving upper and lower bounds on k) and fmincon (giving a starting point for k) and ran into similar accuracy issues. In particular, with fmincon one can set both TolX and TolFun, but playing around with these (down to 10^-6, much higher precision than I need) did not make any difference. Confusingly, sometimes the optimizer even finds a k-value on an earlier iteration that is closer to making the objective function zero than the final k-value it returns.
So, it appears that the algorithm is iterating to a certain point, then failing to take any further step of sufficient size to find a better solution. Does anyone know why the algorithm does not take another, larger step? Is there anything I can adjust to change this? (I have looked at the list under optimset but did not come up with anything useful.)
Thanks a lot!
As you seem to have a 'wild' function that does appear to be monotone in the region, a fairly small range of interest, and not a very high requirement in precision I think all criteria are met for recommending the brute force approach.
Assuming it does not take too much time to evaluate the function in a point, please try this:
Find an upperbound xmax and a lower bound xmin, choose a preferred stepsize and evaluate your function at
xmin:stepsize:xmax
If required (and monotonicity really applies) you can get another upper and lower bound by doing this and repeat the process for better accuracy.
I also encountered this problem while using fmincon. Here is how I fixed it.
I needed to find the solution of a function (single variable) within an optimization loop (multiple variables). Because of this, I needed to provide a large interval for the solution of the single variable function. The problem is that fmincon (or fzero) does not converge to a solution if the search interval is too large. To get past this, I solve the problem inside a while loop, with a huge starting upperbound (1e200) with the constraint made on the fval value resulting from the solver. If the resulting fval is not small enough, I decrease the upperbound by a factor. The code looks something like this:
fval = 1;
factor = 1;
while fval>1e-7
UB = factor*1e200;
[x,fval,exitflag] = fminbnd(#(x)function(x,...),LB,UB,options);
factor = factor * 0.001;
end
The solver exits the while when a good solution is found. You can of course play also with the LB by introducing another factor and/or increase the factor step.
My 1st language isn't English so I apologize for any mistakes made.
Cheers,
Cristian
Why not use a simple bisection method? You always evaluate the middle of a certain interval and then reduce this to the right or left part so that you always have one bound giving a negative and the other bound giving a positive value. You can reduce to arbitrary precision very quickly. Since you reduce the interval in half each time it should converge very quickly.
I would suspect however there is some other problem with that function in that it has discontinuities. It seems strange that fzero would work so badly. It's a deterministic function right?