This question combines math and programming. I will first describe the general problem and then give an example that is (hopefully) simpler to understand.
General Question: Consider a Markov-chain process of N-states with transition matrix Π. Each state is associated with a value x_n (n in {1,…,n}). Our goal is to find the unconditional average of the first two moments (mean and var) along T-period paths conditional on (i) the path starts in a subset of states, N_0, (ii) it ends in a subset of states, N_T, and (iii) it is not going through a subset of states, N_not, in any of the periods between 1 to T-1. By saying we are interested in the unconditional average of these two moments, I basically mean what would be the average of these two moments in the stationary distribution. To be more concrete, let me illustrate the goal of the exercise in a simple case.
Simple Example: Consider a 3-state Markov-chain process with transition matrix Π, and let the three state be denoted by A, B, and C. Each of these states are associated with some value (x_A, x_B, and x_C), respectively. We are interested in what happens along paths that satisfy the following condition. The path starts at point A, after 3 periods are in either points B or C, and between periods 1 to 3 never go again through point A. Denote this condition by (#). So, for example, a path which we are interested in would be {A,B,B,C} with the associated values {x_A, x_B, x_B, x_C}. We are interested in the average and standard deviation along such paths. In particular, we would like to find the unconditional average of these first two moments in paths that satisfy (#).
Let me now propose a solution based on simulating the process. Since both T and N are quite large, this solution is too slow for my purpose.
Simulation Solution: Starting from some initial point simulate the process for a very long time period, and drop the first τ periods. Extract all paths along the simulation that satisfy condition (#) and compute the mean and std along each of these paths. Finally, simply take the average across these paths.
I’m hoping there is a better and more efficient way to achieve the goal. Since I want the solution to be accurate and the size of T and N the simulation takes a long time.
I would love to hear your thoughts and if you know of efficient methods to achieve this goal. Please let me know if something is not clear and I'll try to clarify it.
Thank you!!!
I think I know how to do this if N_0 consists of one state, let's call that state A.
The long run probability of being in A is pi(A) and can be obtained by solving pi = pi*P, with P the transition matrix.
The other thing you need to calculate is the probability of those transient paths. You probably need to introduce a modified P, where all states i in the set N_not are absorbing (i.e. P[i,i]=1 and P[i,j]=0 for j is not i). Then starting from a vector p(0) which has a 1 in the element corresponding to state A and 0 otherwise, you can keep calculating p(n) = p(n-1)*P to get the probabilities of your transient paths.
Multiply the result of that by pi(A) to get the unconditional probability.
You can probably do something like this as well when N_0 is a set, but I don't know how you should select p(0) in that case.
Related
I refrained from asking for help until now, but as my thesis' deadline creeps ever closer and I do not know anybody with experience in RL, I'm trying my luck here.
TLDR;
I have not found an academic/online resource which helps me understand the correct representation of the environment as an observation space. I would be very thankful for any links or for giving me a starting point of how to model the specifics of my environment in an observation space.
Short thematic introduction
The goal of my research is to determine the viability of RL for strategy development in motorsports. This is currently achieved by simulating (lots of!) races and calculating the resulting race time (thus end-position) of different strategic decisions (which are the timing of pit stops + amount of laps to refuel for). This demands a manual input of expected inlaps (the lap a pit stop occurs) for all participants, which implicitly limits the possible strategies by human imagination as well as the amount of possible simulations.
Use of RL
A trained RL agent could decide on its own when to perform a pit stop and how much fuel should be added, in order to minizime the race time and react to probabilistic events in the simulation.
The action space is discrete(4) and represents the options to continue, pit and refuel for 2,4,6 laps respectively.
Problem
The observation space is of POMDP nature and needs to model the agent's current race position (which I hope is enough?). How would I implement the observation space accordingly?
The training is performed using OpenAI's Gym framework, but a general explanation/link to article/publication would also be appreciated very much!
Your observation could be just an integer which represents round or position the agent is in. This is obviously not a sufficient representation so you need to add more information.
A better observation could be the agents race position x1, the round the agent is in x2 and the current fuel in the tank x3. All three of these can be represented by a real number. Then you can create your observation by concating these to a vector obs = [x1, x2, x3].
We are given an n*m grid, which has obstacles at various points,the starting and ending location of the bot. The task is to find a collision free path from start to end. This problem is to be modelled as a SAT problem.
Please guide me on what should be done in this case to get an optimal solution.
I would assume that optimal means the shortest. Using the approach that I've described here you can do first steps:
define a grid
formulate a satisfiability task
At this stage, a solver returns to you random path that satisfies all constraints. An important thing to remember - you can define number of steps k which are required to reach a goal! So you just start with k = 0. Is it possible to reach the goal with 0 actions? Probably, not, until an agent is at the goal already. Then just increment k = 1. Is it possible now? If not, increment more! How to implement it? Just set all k's above a certain limit to False and increment this limit each iteration.
If you know upper limits, you can use binary search to find the shortest possible path, which could be more efficient.
If you care for other properties of a path, you can use pseudo-boolean constraints. By leveraging this approach, you can minimize, for example, a number of right turns. Create a Boolean counter for all possible right turns and limit number of available turns via assumptions.
I have a rather large(not too large but possibly 50+) set of conditions that must be placed on a set of data(or rather the data should be manipulated to fit the conditions).
For example, Suppose I have the a sequence of binary numbers of length n,
if n = 5 then a element in the data might be {0,1,1,0,0} or {0,0,0,1,1}, etc...
BUT there might be a set of conditions such as
x_3 + x_4 = 2
sum(x_even) <= 2
x_2*x_3 = x_4 mod 2
etc...
Because the conditions are quite complex in that they come from experiment(although they can be written down in logic form) and are hard to diagnose I would like instead to use a large sample set of valid data. i.e., Data I know satisfies the conditions and is a pretty large set. i.e., it is easier to collect the data then it is to deduce the conditions that the data must abide by.
Having said that, basically what I'm doing is very similar to neural networks. The difference is, I would like an actual algorithm, in some sense optimal, in some form of code that I can run instead of the network.
It might not be clear what I'm actually trying to do. What I have is a set of data in some raw format that is unique and unambiguous but not appropriate for my needs(in a sense the amount of data is too large).
I need to map the data into another set that actually is ambiguous to some degree but also has certain specific set of constraints that all the data follows(certain things just cannot happen while others are preferred).
The unique constraints and preferences are hard to figure out. That is, the mapping from the non-ambiguous set to the ambiguous set is hard to describe(which is why it is ambiguous). The goal, actually, is to have an unambiguous map by supplying the right constraints if at all possible.
So, on the vein of my initial example, I'm given(or supply) a set of elements and need some way to derive a list of constraints similar to what I've listed.
In a sense, I simply have a set of valid data and train it very similar to neural networks.
Then, after this "Training" I'm given the mapping function I can then use on any element in my dataset and it will produce a new element satisfying the constraint's, or if it can't, will give as close as possible an unambiguous result.
The main difference between neural networks and what I'm trying to achieve is I'd like to be able to use have an algorithm to code to be used instead of having to run a neural network. The difference here is the algorithm would probably be a lot less complex, not need potential retraining, and a lot faster.
Here is a simple example.
Suppose my "training set" are the binary sequences and mappings
01000 => 10000
00001 => 00010
01010 => 10100
00111 => 01110
then from the "Magical Algorithm Finder"(tm) I would get a mapping out like
f(x) = x rol 1 (rol = rotate left)
or whatever way one would want to express it.
Then I could simply apply f(x) to any other element, such as x = 011100 and could apply f to generate a hopefully unambiguous output.
Of course there are many such functions that will work on this example but the goal is to supply enough of the dataset to narrow it down to hopefully a few functions that make the most sense(at the very least will always map the training set correctly).
In my specific case I could easily convert my problem into mapping the set of binary digits of length m to the set of base B digits of length n. The constraints prevents some numbers from having an inverse. e.g., the mapping is injective but not surjective.
My algorithm could be a simple collection if statements acting on the digits if need be.
I think what you are looking for here is an application of Learning Classifier Systems, LCS -wiki. There are actually quite a few LCS open-source applications available, but you may need to experiment with the parameters in order to get a good result.
LCS/XCS/ZCS have the features that you are looking for including individual rules that could be heavily optimized, pressure to reduce the rule-set, and of course a human-readable/understandable set of rules. (Unlike a neural-net)
So I have an array of numbers that look something like
1,708,234
2,802,532
11,083,432
5,098,123
5,777,111
I want to find out when two numbers are within a certain distance from each other (say 1,500,000) so I can group them into the same location and have just one UI element represent both for the level of zoom I am looking at. How would one go about doing this smartly or efficiently. I'm thinking I would just start with the first entry, loop through all the elements, and if one was close to another, flag those two and put it in a dictionary of some sort. That would be my brute force method, but I'm thinking there has to be a better way.
I'm coding in obj-c btw if that makes or breaks any design decisions.
How many numbers are we dealing with here? If it's small enough:
Sort the numbers (generally n-log-n)
Run through each number, n, and compare its bigger neighbor, n+1, to see if it's within your range.
Repeat for n+2, n+3, until the number is no longer within your range.
Your brute force method there is O((n/2)^2). This method will bring it to O(n + n log(n)), or O(n log n) on the average case.
Context: I'm trying to improve the values returned by the iPhone CLLocationManager, although this is a more generally applicable problem. The key is that CLLocationManger returns data on current velocity as and when it feels like it, rather than at a fixed sample rate.
I'd like to use a feedback equation to improve accuracy
v=(k*v)+(1-k)*currentVelocity
where currentVelocity is the speed returned by didUpdateToLocation:fromLocation: and v is the output velocity (and also used for the feedback element).
Because of the "as and when" nature of didUpdateToLocation:fromLocation: I could calculate the time interval since it was last called, and do something like
for (i=0;i<timeintervalsincelastcalled;i++) v=(k*v)+(1-k)*currentVelocity
which would work, but is wasteful of cycles. Especially as I probably want timeintervalsincelastcalled to be measured as 10ths of a second.
Is there a way to solve this without the loop ? i.e. rework (integrate?) the formula so I put an interval into the equation and get the same answer as I would have by iteration ?
If you write your original equation as
v = k*vCurrent + (1-k)*v
you can apply the answer from another SO question.
Instead of iterating, you could just choose the value of k based on the size of the interval. For example, if the interval length is an hour - you'd probably want k to be 0.
It would be easy to precompute k for a variety of interval sizes to give the same answer as the iteration would give. Just compute the change by iterating (you already have code for that), and then compute the value of k that would give you that algebraicly.
It's a common programmer jedi trick to have a table of lookup values in place of expensive calculations. (there, now my answer has something to do with code!)