In my engineering class we are programming a "non-trivial" predator/prey pursuit problem.
Here's the gist of the situation:
There is a prey that is trying to escape a predator. Each can be modeled as a particle that can be animated in MATLAB (we have to use this coding language).
The prey:
can maneuver (turn) easier than the predator can
The predator:
can move faster than the prey
I have to create code for both the predator and the prey, which will be used in a class competition.
This is basically what the final product will look like:
http://www.brown.edu/Departments/Engineering/Courses/En4/Projects/pred_prey.gif
The goal is to catch the other team's prey in the shortest amount of time, and for my prey to become un-catchable for the other team's predator (or at least escape for a long period of time).
Here are the specific design constraints:
3. Design Constraints:
Predator and prey can only move in the x-y plane
Simulations will run for a time period of 250 seconds.
Both predator and prey will be subjected to three forces: (a) The propulsive force; (b) a viscous drag
force; and (c) a random time-varying force. (all equations given)
1. The propulsive forces will be determined by functions provided by the two competing groups
The predator is assumed to catch the prey if the distance between predator and prey drops below 1m.
You may not use the rand() function in computing your predator/prey forces – the only random forces
should be those generated by the script provided. (EOM with random forces are impossible for the
ODE solver to integrate, and it ends up in an infinite loop).
For the competition, we will provide the MATLAB code that will compute and animate the trajectories of
the competitors, and will determine the winner of each contest. The test code will be working in SI units.
I am looking for any resources that may be able to help me with some strategy. I have looked at basic pursuit curves, but I would love to look at some examples where the prey is not moving in a straight line. Any other coding advice or strategies would be greatly appreciated!
It's a good idea to start with the fundamentals in any field, and you can't go past the work of Issacs (Differential Games: A mathematical theory with applications to warfare and pursuit, control and optimization). This will almost certainly end up being a reference in any academic research project you may end up writing up.
Steven Lavalle's excellent book Motion Planning has a number of aspects that may be of interest including a section on visibility based pursuit evasion.
As for many mathematical topics, Wolfram Mathworld has some good diagrams and links that might get you thinking in the right direction (eg Pursuit Curves).
If you want to have a look at a curious problem in the area that is well understood try the Homicidal chauffeur problem - this will at least give you some grounds for comparing complexity / efficiency of different techniques. In particular, this is probably a good way to get a feel for level set methods (the paper Homicidal Chaueur Game. Computation of Level Sets of the Value Function by Patsko and Turova appears to have a number of images that might be helpful)
Related
I am working on driving industrial robots with neural nets and so far it is working well. I am using the PPO algorithm from the OpenAI baseline and so far I can drive easily from point to point by using the following rewarding strategy:
I calculate the normalized distance between the target and the position. Then I calculate the distance reward with.
rd = 1-(d/dmax)^a
For each time step, I give the agent a penalty calculated by.
yt = 1-(t/tmax)*b
a and b are hyperparameters to tune.
As I said this works really well if I want to drive from point to point. But what if I want to drive around something? For my work, I need to avoid collisions and therefore the agent needs to drive around objects. If the object is not straight in the way of the nearest path it is working ok. Then the robot can adapt and drives around it. But it gets more and more difficult to impossible to drive around objects which are straight in the way.
See this image :
I already read a paper which combines PPO with NES to create some Gaussian noise for the parameters of the neural network but I can't implement it by myself.
Does anyone have some experience with adding more exploration to the PPO algorithm? Or does anyone have some general ideas on how I can improve my rewarding strategy?
What you describe is actually one of the most important research areas of Deep RL: the exploration problem.
The PPO algorithm (like many other "standard" RL algos) tries to maximise a return, which is a (usually discounted) sum of rewards provided by your environment:
In your case, you have a deceptive gradient problem, the gradient of your return points directly at your objective point (because your reward is the distance to your objective), which discourage your agent to explore other areas.
Here is an illustration of the deceptive gradient problem from this paper, the reward is computed like yours and as you can see, the gradient of your return function points directly to your objective (the little square in this example). If your agent starts in the bottom right part of the maze, you are very likely to be stuck in a local optimum.
There are many ways to deal with the exploration problem in RL, in PPO for example you can add some noise to your actions, some other approachs like SAC try to maximize both the reward and the entropy of your policy over the action space, but in the end you have no guarantee that adding exploration noise in your action space will result in efficient of your state space (which is actually what you want to explore, the (x,y) positions of your env).
I recommend you to read the Quality Diversity (QD) literature, which is a very promising field aiming to solve the exploration problem in RL.
Here is are two great resources:
A website gathering all informations about QD
A talk from ICLM 2019
Finally I want to add that the problem is not your reward function, you should not try to engineer a complex reward function such that your agent is able to behave like you want. The goal is to have an agent that is able to solve your environment despite pitfalls like the deceptive gradient problem.
I've been playing with solar system simulation lately using Barnes-Hut algorithm to speed things up.
Now simulation works fine when feed with our solar system data, but I'd like to test it on something bigger.
Now, I tried to generate 500+ random bodies, and even add initial orbital motion around center of gravity - but every time after short time while most of the bodies end up ejected far away into space.
Are there any methods to generate random sets of planets/stars for simulations like this that will remain relatively stable ?
You should probably ask this question on the Physics or Mathematics stackexchange.
I think this is a very difficult question, to the point that great mathematicians have studied the stability of the solar system. Things are "easy" for the two body problem, but the three body problem is notorious for its chaotic behavior (Poincare studied it carefully and in the process laid out the fundament of the qualitative theory of dynamical systems). If I am not mistaken (feel free to check this online), instability of orbital dynamics of large number of bodies (large meaning three or more) is a condition, whose probability of occurrence is very high. Meanwhile, coming across stable configurations has a vary low probability.
Now, for so called integrable systems ("exactly solvable"),
like n copies of decoupled sun-one-planed models of a solar/star system, small perturbations are more likely to yield stable dynamics, due to the Kolmogorov-Arnold-Moser's theorem. So I can say that it is more likely for you to come across stability, if you first set up the bodies in your simulation to be comparatively small gravity sources orbiting one significantly larger gravitational source. Each body has one dominating force from the large source and many much smaller perturbations from the rest of the bodies (or the averaged sources of your Barnes-Hut algorithm). If you consider only the dominating force, and turn off the perturbations, you would have a solar system with n decoupled two-body systems (each body following elliptical motion around a common gravitational center). If you turn on the perturbations, this dynamics changes, but it tends to deviate from the unperturbed one very slowly, and is more likely to be stable. So start with highly ordered dynamics and start changing slightly the body's masses and their positions and velocities. You could follow how the dynamics changes when you alter the parameters and the initial conditions.
One more thing, it is always a good idea to place the inertial coordinate system, with respect to which the positions and the velocities of the bodies are represented, in the center of mass of the group of bodies. This is more or less guaranteed when the initial momenta sum up to the zero vector. This set up yields the center of mass of the system is always fixed at some point in space, so a simple translation will move it to the origin of the coordinate system.
I'm creating an evolution-artificial-life-simulation game in 2D (purely for fun purposes). It combines neural networks (for behaviour controlling) and genetic algorithm (for breeding and mutations).
On input I give them X,Y position of nearest food (normalized) and X,Y position of the "look at" vector.
Currently they fly around and when they collide with food (let's call it "eating apples") their fitness index is increased by one and the apple's position is randomed - after 2000 turns the GA interrupts and does its magic.
After about 100 generations they learn that eating apples is good and try to fly to the nearest ones.
But my question, as a neural network newbie, is - if I created a room where apples spawn way more frequent than on the rest of the map, would they learn and understand that? Would they fly to that room more often? And is it possible to tell how many generations would it take for them to learn?
What they can learn and how fast depends a lot on the information you give them access to. For instance, if they have no way of knowing that they are in the room where food generates more frequently, then there is no way for them to evolve to go there more frequently.
It's not entirely clear from your question what the "look at" vector is. If it, for instance, shows them what's directly in front of them, then it might be enough information for them to figure out that they're in the room of plenty, particularly if that room "looks" distinctive somehow. A more useful input to give them might be their current X and Y coordinates. If you did that, then I would definitely expect them to evolve to be in the good room more frequently (in proportion to how good it is, of course), because it would be possible for them to take action to go to and stay in that room.
As for how many generations it will take, that is incredibly hard to predict (especially without knowing more about your setup). If it takes them 100 generations to learn to eat food, then I would expect it to be on the order of hundreds. But the best way to find out is just to try it.
If it's all about location, they may keep a state of the map in their mind and simple statistics will let them learn where the food may be located. Neural nets is an overkill there.
If there are other features of locations (for example color, smell, height etc...) to map those features to the label (food exists or not) is good for neural nets. Especially if some of features not available or not reliable randomly at the moment.
If they need many decisions to reach the goal, you will need reinforcement learning. Forexample, they may go to a direction which is good for a time, but make them away from resources they will need later.
I believe that a recurrent neural network could learn to expect apples to spawn in a certain region.
i am reading in soft computing algorithms ,currently in "Particle Swarm Optimization ",i understand the technique in general but ,i stopped at mathematical or physics part which i can't imagine or understand how it works or how it affect the flying,that part is the first part in the equation which update the velocity which is called the "Inertia Factor"
the complete update velocity equation is :
i read in one article in section 2.3 "Ineteria Factor" that:
"This variation of the algorithm aims to balance two possible PSO tendencies (de-
pendent on parameterization) of either exploiting areas around known solutions
or explore new areas of the search space. To do so this variation focuses on the
momentum component of the particles' velocity equation 2. Notice that if you
remove this component the movement of the particle has no memory of the pre-
vious direction of movement and it will always explore close to a found solution.
On the other hand if the velocity component is used, or even multiplied by a w
(inertial weight, balances the importance of the momentum component) factor
the particle will tend to explore new areas of the search space since it cannot
easily change its velocity towards the best solutions. It must rst \counteract"
the momentum previously gained, in doing so it enables the exploration of new
areas with the time \spend counteracting" the previous momentum. This vari-
ation is achieved by multiplying the previous velocity component with a weight
value, w."
the full pdf at: https://www.google.com.eg/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&ved=0CDIQFjAA&url=http%3A%2F%2Fweb.ist.utl.pt%2F~gdgp%2FVA%2Fdata%2Fpso.pdf&ei=0HwrUaHBOYrItQbwwIDoDw&usg=AFQjCNH8vChXHXWz_ydHxJKAY0cUa94n-g
but i can't also imagine how physicaly or numerically this is happend and how this factor affect going from exploration level to exploitative level ,so need a numerical example to see how it's work and imagine how it's work.
also ,in Genetic Algorithm there's a schema theorem which is a proof of GA success of finding optimum solution,is there's such athoerm for PSO.
It's not easy to explain PSO using mathematics (see Wikipedia article for example).
But you can think like this: the equation has 3 parts:
particle speed = inertia + local memory + global memory
So you control the 'importance' of these components by varying the coefficientes in each part.
There's no analytical way to see this, unless you make the stocastic part constant and ignore things like particle-particle interation.
Exploit: take advantage of the best know solutions (local and global).
Explore: search in new directions, but don't ignore the best know solutions.
In a nutshell, you can control how much importance to give for the particle current speed (inertia), the particle memory of the best know solution, and the particle memory of the swarm best know solution.
I hope it can help you!
Br's
Inertia was not the part of the original PSO algorithm introduced by Kennedy and Eberhart in 1995. It's been three years until Shi and Eberhart published this extension and showed (to some extent) that it works better.
One can set that value to a constant (supposedly [0.8 to 1.2] is best).
However, the point of the parameter is to balance exploitation and exploration of space, and
authors got best results when they defined the parameter with a linear function, that decreases over time from [1.4 to 0].
Their rationale was that first one should exploit solutions to find a good seed and later exploit area around the seed.
My feeling about it is that the closer you are to 0, the more chaotic turns particles make.
For a detailed answer refer to Shi, Eberhart 1998 - "A modified Particle Swarm Optimizer".
Inertia controls the influence of the previous velocity.
When high, cognitive and social components are less relevant. (particle keeps going its way, exploring new portions of the space)
When low, particle explores better the space where the best-so-far optimum has been found
Inertia can change over time: Start high, later decrease
I need to program an algorithm to navigate a robot through a "maze" (a rectangular grid with a starting point, a goal, empty spaces and uncrossable spaces or "walls"). It can move in any cardinal direction (N, NW, W, SW, S, SE, E, NE) with constant cost per move.
The problem is that the robot doesn't "know" the layout of the map. It can only view it's 8 surrounding spaces and store them (it memorizes the surrounding tiles of every space it visits). The only other input is the cardinal direction in which the goal is on every move.
Is there any researched algorithm that I could implement to solve this problem? The typical ones like Dijkstra's or A* aren't trivialy adapted to the task, as I can't go back to revisit previous nodes in the graph without cost (retracing the steps of the robot to go to a better path would cost the moves again), and can't think of a way to make a reasonable heuristic for A*.
I probably could come up with something reasonable, but I just wanted to know if this was an already solved problem, and I need not reinvent the wheel :P
Thanks for any tips!
The problem isn't solved, but like with many planning problems, there is a large amount of research already available.
Most of the work in this area is based on the original work of R. E. Korf in the paper "Real-time heuristic search". That paper seems to be paywalled, but the preliminary results from the paper, along with a discussion of the Real-Time A* algorithm are still available.
The best recent publications on discrete planning with hidden state (path-finding with partial knowledge of the graph) are by Sven Koenig. This includes the significant work on the Learning Real-Time A* algorithm.
Koenig's work also includes some demonstrations of a range of algorithms on theoretical experiments that are far more challenging that anything that would be likely to occur in a simulation. See in particular "Easy and Hard Testbeds for Real-Time Search Algorithms" by Koenig and Simmons.