When to use Policy Iteration instead of Value Iteration

When to use Policy Iteration instead of Value Iteration - mdp

I'm currently studying dynamic programming solutions to Markov Decision Processes. I feel like I've got a decent grip on VI and PI and the motivation for PI is pretty clear to me (converging on the correct state utilities seems like unnecessary work when all we need is the correct policy). However, none of my experiments show PI in a favourable light in terms of runtime. It seems to consistently take longer regardless of the size of the state space and discount factor.
This could be due the implementation (I'm using the BURLAP library), or poor experimentation on my part. However, even the trends don't seem to show a benefit. It should be noted that the BURLAP implementation of PI is actually "modified policy iteration" which runs a limited VI variant at each iteration. My question to you is do you know of any situations, theoretical or practical, in which (modified) PI should outperform VI?

Turns out that Policy Iteration, specifically Modified Policy Iteration, can outperform Value Iteration when the discount factor (gamma) is very high.
http://www.cs.cmu.edu/afs/cs/project/jair/pub/volume4/kaelbling96a.pdf

Related

For a Single Cycle CPU How Much Energy Required For Execution Of ADD Command

The question is obvious like specified in the title. I wonder this. Any expert can help?

OK, this is was going to be a long answer, so long that I may write an article about it instead. Strangely enough, I've been working on experiments that are closely related to your question -- determining performance per watt for a modern processor. As Paul and Sneftel indicated, it's not really possible with any real architecture today. You can probably compute this if you are looking at only the execution of that instruction given a certain silicon technology and a certain ALU design through calculating gate leakage and switching currents, voltages, etc. But that isn't a useful value because there is something always going on (from a HW perspective) in any processor newer than an 8086, and instructions haven't been executed in isolation since a pipeline first came into being.
Today, we have multi-function ALUs, out-of-order execution, multiple pipelines, hyperthreading, branch prediction, memory hierarchies, etc. What does this have to do with the execution of one ADD command? The energy used to execute one ADD command is different from the execution of multiple ADD commands. And if you wrap a program around it, then it gets really complicated.
SORT-OF-AN-ANSWER:
So let's look at what you can do.
Statistically measure running a given add over and over again. Remember that there are many different types of adds such as integer adds, floating-point, double precision, adds with carries, and even simultaneous adds (SIMD) to name a few. Limits: OSs and other apps are always there, though you may be able to run on bare metal if you know how; varies with different hardware, silicon technologies, architecture, etc; probably not useful because it is so far from reality that it means little; limits of measurement equipment (using interprocessor PMUs, from the wall meters, interposer socket, etc); memory hierarchy; and more
Statistically measuring an integer/floating-point/double -based workload kernel. This is beginning to have some meaning because it means something to the community. Limits: Still not real; still varies with architecture, silicon technology, hardware, etc; measuring equipment limits; etc
Statistically measuring a real application. Limits: same as above but it at least means something to the community; power states come into play during periods of idle; potentially cluster issues come into play.
When I say "Limits", that just means you need to well define the constraints of your answer / experiment, not that it isn't useful.
SUMMARY: it is possible to come up with a value for one add but it doesn't really mean anything anymore. A value that means anything is way more complicated but is useful and requires a lot of work to find.
By the way, I do think it is a good and important question -- in part because it is so deceptively simple.

ILP solvers with small memory footprint

I'm trying to solve a sequence-labelling problem by formulating it as an integer linear program (as an experiment to see how well doing it in that way works). I've already found some suggestions for solvers on SO but I would like to get some more fine-grained advice due to some constraints I'm under (yes, that pun was actually intended).
I'm running out of memory on more than half of my sequences due to their length while using COIN-OR although I see no reason I need to use so much memory for my problem at hand: This is a Boolean linear program, so I would theoretically need only one bit per feature. However, e.g. the COIN Open Solver Interface seems to be able to use only double values for e.g. defining constraints.
Are there any (free) ILP packages which are well-suited for either Boolean problems or at least for problems with a very small range of potential values?

CPLEX seems to be considered approximately the state of the art, and in my experience for hard ILPs it is often better than any free solver I found. Unfortunately, CPLEX is not free, except for academic users; IBM does offer free access to CPLEX for students and researchers at educational institutions, if you fit that description.

Numerical Integral of large numbers in Fortran 90

so I have the following Integral that i need to do numerically:
Int[Exp(0.5*(aCosx + bSinx + cCos2x + dSin2x))] x=0..2Pi
The problem is that the output at any given value of x can be extremely large, e^2000, so larger than I can deal with in double precision.
I havn't had much luck googling for the following, how do you deal with large numbers in fortran, not high precision, i dont care if i know it to beyond double precision, and at the end i'll just be taking the log, but i just need to be able to handle the large numbers untill i can take the log..
Are there integration packes that have the ability to handle arbitrarily large numbers? Mathematica clearly can.. so there must be something like this out there.
Cheers

This is probably an extended comment rather than an answer but here goes anyway ...
As you've already observed Fortran isn't equipped, out of the box, with the facility for handling such large numbers as e^2000. I think you have 3 options.
Use mathematics to reduce your problem to one which does (or a number of related ones which do) fall within the numerical range that your Fortran compiler can compute.
Use Mathematica or one of the other computer algebra systems (eg Maple, SAGE, Maxima). All (I think) of these can be integrated into a Fortran program (with varying degrees of difficulty and integration).
Use a library for high-precision (often called either arbitray-precision or multiple-precision too) arithmetic. Your favourite search engine will turn up a number of these for you, some written in Fortran (and therefore easy to integrate), some written in C/C++ or other languages (and therefore slightly harder to integrate). You might start your search at Lawrence Berkeley or the GNU bignum library.
(Yes I know that I wrote that you have 3 options, but your question suggests that you aren't ready to consider this yet) You could write your own high-/arbitrary-/multiple-precision functions. Fortran provides everything you need to construct such a library, there is a lot of work already done in the field to learn from, and it might be something of interest to you.
In practice it generally makes sense to apply as much mathematics as possible to a problem before resorting to a computer, that process can not only assist in solving the problem but guide your selection or construction of a program to solve what's left of the problem.

I agree with High Peformance Mark that the best option here numerically is to use analytics to scale or simplify the result first.
I will mention that if you do want to brute force it, gfortran (as of 4.6, with the libquadmath library) has support for quadruple precision reals, which you can use by selecting the appropriate kind. As long as your answers (and the intermediate results!) don't get too much bigger than what you're describing, that may work, but it will generally be much slower than double precision.

This requires looking deeper at the problem you are trying to solve and the behavior of the underlying mathematics. To add to the good advice already provided by Mark and Jonathan, consider expanding the exponential and trig functions into Taylor series and truncating to the desired level of precision.
Also, take a step back and ask why you are trying to accomplish by calculating this value. As an example, I recently had to debug why I was getting outlandish results from a property correlation which was calculating vapor pressure of a fluid to see if condensation was occurring. I spent a long time trying to understand what was wrong with the temperature being fed into the correlation until I realized the case causing the error was a simulation of vapor detonation. The problem was not in the numerics but in the logic of checking for condensation during a literal explosion; physically, a condensation check made no sense. The real problem was the code was asking an unnecessary question; it already had the answer.
I highly recommend Forman Acton's Numerical Methods That (Usually) Work and Real Computing Made Real. Both focus on problems like this and suggest techniques to tame ill-mannered computations.

on the using of MassWithStopAndFriction and hard stops in OpenModelica

I have a question about hard stops in Modelica.Mechanics.Translational.Components.MassWithStopAndFriction.
As I can understand mass should not move outside interval (smin, smax)
But it actually does in my example that I include here:
model ActuatorMechanics
Modelica.Mechanics.Translational.Sources.Force force;
Modelica.Mechanics.Translational.Components.MassWithStopAndFriction mass(m=1,F_prop=0,F_Coulomb=10, smax=0.1, smin=0, L=0.01);
Modelica.Mechanics.Translational.Components.Spring spring(c=1000);
Modelica.Mechanics.Translational.Components.Fixed fixed;
Modelica.Mechanics.Translational.Sensors.PositionSensor sens_pos;
equation
connect(force.flange, mass.flange_a);
connect(mass.flange_b, spring.flange_a);
connect(spring.flange_b, fixed.flange);
connect(sens_pos.flange, mass.flange_a);
force.f = 100;
end ActuatorMechanics;
simulate(ActuatorMechanics)
plot(mass.flange_a.s)
Am I doing something wrong?

This was a bug in OpenModelica. It's working since r11060 and a regression test has been added.

Well, this is really a question for the OpenModelica developers so hopefully one of them will pop in here and answer.
Just to give you a little bit of background on what is going on in the model, when the mass hits a stop, it switches into a different state where it constrains the mass to a zero acceleration (not velocity) and computes the reaction force necessary to hold that constraint.
This complexity is due mostly to the fact that OpenModelica (and most, if not all, other Modelica tools) have difficulty handling variable index DAEs. The trick here is to detect the point at which the mass reaches its mechanical limit, use reinit to set the velocity to zero and then enforce the no acceleration constraint I mentioned above.
This all depends on the velocity being a state. It is possible to formulate a system using this component where the velocity will be prescribed on the system (and therefore not a state). In this case, the reinit has no effect. But I would suspect that you would get a singular system of equations at that point (since you'd have essentially two equations for the acceleration of the mass once the mechanical limit is reached). So I am surprised that you are seeing movement beyond those limits.
Another possibility could be that the mass starts its motion outside the mechanical limits and somehow the switch to the alternative (constrained) equations is not made.
Again, it is really a question for the OpenModelica developers. I'm just trying to provide some insights as to things that could go wrong with such a model. Although I admit such insights are not particularly useful in answering your question.
I would suggest you also contact the OpenModelica developers (in addition to posting here on StackOverflow) since they may not see this question.
Good luck.

How to find the time value of operation to optimize new algorithm design?

My question is specific to iPhone, iPod, and iPad, since I am assuming that the architecture makes a big difference. I'm hoping there is either a specification somewhere (for the various chips perhaps), or a reliable way to measure T for each specific instruction. I know I can use any number of tools to measure aggregate processor time used, memory used, etc. I want to quantify at a lower level.
So, I'm able to figure out how many times I go through the main part of the algorithm. For example, I iterate n * (n-1) times in a naive implementation, and between n (best case) and n + n * (n-1) (worst case) in another. I can also make a reasonable count of the total number of instructions (+ - = % * /, and logic statements), and I can compare those counts, but that's assuming the weight of each operation is the same. Also, I don't have any idea how to weight the actual time value of a logic statement (if, else, for, while) vs a mathematical operator... is "if" as much work as "+" each time I use it? I would love to know where to find this information.
So, for clarity, my goal is to discover how much processor time I am demanding of the CPU (or GPU or any U) so that I can design an optimal algorithm around processor time. Can someone give me an idea of where to start for iOS hardware?
Edit: This link to ClockServices.c and SIMD stuff in the developer portal might be a good start for people interested in this. A few more cups of coffee tonight and I might get through it ;)

On a modern platform, processor time isn't the only limiting factor. Often, memory access is.
Still, processor time:
Your basic approach at an estimation for the processor load is OK, though, and is sensible: Make a rough estimate of the cost based on your knowledge of typical platforms.
In this article, Table 1 shows the times for typical primitive operations in .NET. While your platform may vary, the relative time is usually very similar. Maybe you can find - or even make - one for iStuff.
(I haven't come across one so thorough for other platforms, except processor / instruction set manuals, but they deal with assembly instructions)
memory locality:
A cache miss can cost you hundreds of cycles, a disk access a thousand times as much. So controlling your memory access patterns (i.e. reducing the working set, restructuring and accessing data in a cache-friendly way) is an important part of evaluating an algorithm.

xCode has instruments to measure performance of each function/operation, you can simply use them.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse