How can I create a race track with resolution about 60 units wide and 4000 units long (for up to 20 agents)?
Note: during any simulation, the focus/zoom may be only where the agents are located along the track.
Racetrack shape example:
http://www.offtrackbetting.com/images/racetracks/OP/oaklawn_park_tl.gif
Any useful link or example would be greatly appreciated! Thanks
Related
I refrained from asking for help until now, but as my thesis' deadline creeps ever closer and I do not know anybody with experience in RL, I'm trying my luck here.
TLDR;
I have not found an academic/online resource which helps me understand the correct representation of the environment as an observation space. I would be very thankful for any links or for giving me a starting point of how to model the specifics of my environment in an observation space.
Short thematic introduction
The goal of my research is to determine the viability of RL for strategy development in motorsports. This is currently achieved by simulating (lots of!) races and calculating the resulting race time (thus end-position) of different strategic decisions (which are the timing of pit stops + amount of laps to refuel for). This demands a manual input of expected inlaps (the lap a pit stop occurs) for all participants, which implicitly limits the possible strategies by human imagination as well as the amount of possible simulations.
Use of RL
A trained RL agent could decide on its own when to perform a pit stop and how much fuel should be added, in order to minizime the race time and react to probabilistic events in the simulation.
The action space is discrete(4) and represents the options to continue, pit and refuel for 2,4,6 laps respectively.
Problem
The observation space is of POMDP nature and needs to model the agent's current race position (which I hope is enough?). How would I implement the observation space accordingly?
The training is performed using OpenAI's Gym framework, but a general explanation/link to article/publication would also be appreciated very much!
Your observation could be just an integer which represents round or position the agent is in. This is obviously not a sufficient representation so you need to add more information.
A better observation could be the agents race position x1, the round the agent is in x2 and the current fuel in the tank x3. All three of these can be represented by a real number. Then you can create your observation by concating these to a vector obs = [x1, x2, x3].
I run an infectious disease spread model similar to "VIRUS" model in the model library changing the "infectiousness".
I did 20 runs each for infectiousness values 98% , 95% , 93% and the Maximum infected count was 74.05 , 73 ,78.9 respectively. (peak was at tick 38 for all 3 infectiousness values)
[I took the average of the infected count for each tick and took the maximum of these averages as the "maximum infected".]
I was expecting the maximum infected count to decrease when the infectiousness is reduced, but it didn't. As per what I understood this happens, because I considered the average values of each simulation run. (It is like I am considering a new simulation run with average infected count for each tick ).
I want to say that, I am considering all 20 simulation runs. Is there a way to do that other than the way I used the average?
In the Models Library Virus model with default parameter settings at other values, and those high infectiousness values, what I see when I run the model is a periodic variation in the numbers three classes of person. Look at the plot in the lower left corner, and you'll see this. What is happening, I believe, is this:
When there are many healthy, non-immune people, that means that there are many people who can get infected, so the number of infected people goes up, and the number of healthy people goes down.
Soon after that, the number of sick, infectious people goes down, because they either die or become immune.
Since there are now more immune people, and fewer infectious people, the number of non-immune healthy grows; they are reproducing. (See "How it works" in the Info tab.) But now we have returned to the situation in step 1, ... so the cycle continues.
If your model is sufficiently similar to the Models Library Virus model, I'd bet that this is part of what's happening. If you don't have a plot window like the Virus model, I recommend adding it.
Also, you didn't say how many ticks you are running the model for. If you run it for a short number of ticks, you won't notice the periodic behavior, but that doesn't mean it hasn't begun.
What this all means that increasing infectiousness wouldn't necessarily increase the maximum number infected: a faster rate of infection means that the number of individuals who can infected drops faster. I'm not sure that the maximum number infected over the whole run is an interesting number, with this model and a high infectiousness value. It depends what you are trying to understand.
One of the great things about NetLogo and some other ABM systems is that you can watch the system evolve over time, using various tools such as plots, monitors, etc. as well as just looking at the agents move around or change states over time. This can help you understand what is going on in a way that a single number like an average won't. Then you can use this insight to figure out a more informative way of measuring what is happening.
Another model where you can see a similar kind of periodic pattern is Wolf-Sheep Predation. I recommend looking at that. It may be easier to understand the pattern. (If you are interested in mathematical models of this kind of phenomenon, look up Lotka-Volterra models.)
(Real virus transmission can be more complicated, because a person (or other animal) is a kind of big "island" where viruses can reproduce quickly. If they reproduce too quickly, this can kill the host, and prevent further transmission of the virus. Sometimes a virus that reproduces more slowly can harm more people, because there is time for them to infect others. This blog post by Elliott Sober gives a relatively simple mathematical introduction to some of the issues involved, but his simple mathematical models don't take into account all of the complications involved in real virus transmission.)
EDIT: You added a comment Lawan, saying that you are interested in modeling COVID-19 transmission. This paper, Variation and multilevel selection of SARS‐CoV‐2 by Blackstone, Blackstone, and Berg, suggests that some of the dynamics that I mentioned in the preceding remarks might be characteristic of COVID-19 transmission. That paper is about six months old now, and it offered some speculations based on limited information. There's probably more known now, but this might suggest avenues for further investigation.
If you're interested, you might also consider asking general questions about virus transmission on the Biology Stackexchange site.
Using the Road Traffic Library, I have created a 4-way intersection that is stop-controlled and I need to measure the average delay for each approach. Currently I am using timeMeasureStart and timeMeasureEnd blocks, showing the time taken as the car enters the road/model, until it exits the intersection.
Instead, I want to measure from the time the car slows to 40km/h, until it exits the intersection. Any suggestions?
The initial speed of all cars entering the model is 60 km/hr.
Sure, there is no pre-defined way but this custom approach should work:
make sure your cars are a custom agent type, not the default cars. Lets name it My Car
add a variable myTimeBelow40 into MyCar of type double
check the car speed regularly (every 0.1 sec?!) in an event in MyCar.checkSpeed. Use the getSpeed() function. If it is below 40 KPH, you log the current time into myTimeBelow40
log the departure time of your car: your time since 40 KPH is the difference
Finally, add a statistic across your car population or log your individual car duration into main
#Benjamin thank you for pointing me in the right direction.
Here is my solution, guided by the suggestion. I'm sure it could be refined but in the end it is what worked for me and my limited knowledge of AnyLogic.
For a 4-way intersection, I wanted the delay for each approach, so I created 4 custom car agents, with each populations starting out empty.
In each agent, I had 2 variable blocks - var_Start and var_Slow, an one event block set to Timeout, Cyclic, first read at time(), and proceeding in intervals of 0.1s. In the event action, i specidied the following:
if(getSpeed(KPH) <= 40) {
var_Slow=time();
var_Slow.suspend();
}
In main I used Histogram Data, labeled as dataDelay, and a chart with the mean showing, to see the results. I had one for each intersection.
Back in the car agent, in actions on startup:
var_Start=time();
and on destroy:
if(var_Slow = 0)
main.dataDelay.add(time()-var_Start);
else
main.dataDelay.add(time()-var_Slow);
At the car source block in main, I kept the initial and prefered speed at 60, however if there were cars backed up then new cars were often initiated at a slower speed, sometimes already below 40kph, hence the if,else code on destroy.
I had everything labeled according to its corresponding approach direction, unlike the simplified version I have here.
I'm tried to train a reinforcement learning agent to play an endless runner game using Unity-ML.
The game is simple: an obstacle is approaching from the side and the agent has to jump at the right timing to overcome it.
As the observation, I have the distance to the next obstacle. Possible actions are 0 - idle; 1 - jump. Rewards are given for longer playtime.
Unfortunately, the agent fails to learn to overcome even the 1st obstacle reliable. I guess this is due too high imbalance on the two actions as the ideal policy would be doing nothing (0) most of the time and jump (1) only at very specific points in time. Additionally, all actions during a jump are meaningless since the agent cannot jump while in the air.
How can I improve the learning such that it convergence nevertheless? Any suggestions what to look into?
Current trainer config:
EndlessRunnerBrain:
gamma: 0.99
beta: 1e-3
epsilon: 0.2
learning_rate: 1e-5
buffer_size: 40960
batch_size: 32
time_horizon: 2048
max_steps: 5.0e6
Thanks!
It's difficult to say without seeing the exact code that's being used for the reinforcement learning algorithm. Here are some steps worth exploring:
How long are you letting the agent train? Depending on the complexity of the game environment, it very well may take thousands of episodes for the agent to learn to avoid its first obstacle.
Experiment with the Frameskip property of the Academy object. This permits the agent to only take an action after a number of frames have passed. Increasing this value may increase the speed of learning in more simple games.
Adjust the learning rate. The learning rate determines how heavily the agent weights new information versus old information. You're using a very small learning rate; try increasing it by a couple decimal places.
Adjust epsilon. Epsilon determines how often a random action is taken. Given a state and an epsilon rate of 0.2, your agent will take a random action 20% of the time. The other 80% of the time, it will choose the (state, action) pair with the highest associated reward. You can try reducing or increasing this value to see if you get better results. Since you know you'll want more random actions in the beginning of training, you can even "decay" epsilon with each episode. If you start with an epsilon value of 0.5, after each game episode is completed, reduce epsilon by a small value, say 0.00001 or so.
Change the way the agent is rewarded. Instead of rewarding the agent for each frame it stays alive, perhaps you could reward the agent for each obstacle it successfully jumps over.
Are you sure that the given time_horizon and max_steps provide enough runway for the game to complete an episode?
Hope this helps, and best of luck!
I am making an AI for a zero-sum 4-player board game. It's actually not zero-sum (the 4 players will "die" when they lose all their lives, so there will be a player who died first, second, third and a player who survived. However, I am telling the AI that only surviving counts as a win and anything else is a lose) After some research, I figured I would use a minimax algorithm in combination with a heuristic function. I came across this question and decided to do the same as the OP of that question - write an evolutionary algorithm that gives me the best weights.
However, my heuristic function is different from the one the OP of that question had. Mine takes 9 weights and is a lot slower, so I can't let the agents play 1000 games (takes too much time) or breed them with the crossover method (how do I do a crossover with 9 weights?).
So I decided to come up with my own method of determining fitness and breeding. And this question is only about the fitness function.
Here are my attempts at this.
First Attempt
For each agent A in a randomly generated population of 50 agents, select 3 more agents from the population (with replacement but not the same agent as A itself) and let the 4 agents play a game where A is the first player. Select another 3 and play a game where A is the second player, and so on. For each of these 4 games, if A died first, its fitness does not change. If A died second, its fitness is increased by 1. If it died third, its fitness is increased by 2. If it survived, its fitness is increased by 3. Therefore, I concluded that the highest fitness one can get is 12 (surviving/wining all 4 games -> 3 + 3 + 3 + 3).
I ran this for 7 generations and starting from the first generation, the highest fitness is as high as 10. And I calculated the average fitness of the top 10 agents, but the average didn't increase a bit throughout the 7 generations. It even decreased a little.
I think the reason why this didn't work is because there's gotta be a few agents that got lucky and got some poor performing agents as its opponents.
Second Attempt
The game setups are the same as my first attempt but instead of measuring the results of each game, I decided to measure how many moves did that agent make before it died.
After 7 generations the average fitness of top 10 does increase but still not increasing as much as I think it should.
I think the reason why this failed is that the game is finite, so there is a finite number of moves you can make before you die and the top performing agents pretty much reached that limit. There is no room for growth. Another reason is that the fitness of the player who survived and the fitness of the player who died third differs little.
What I want
From my understanding of EAs (correct me if I'm wrong), the average fitness should increase and the top performing individual's fitness should not decrease over time.
My two attempts failed at both of these. Since the opponents are randomly selected, the top performing agent in generation 1 might get stronger opponents in the next generation, and thus its fitness decreases.
Notes
In my attempts, the agents play 200 games each generation and each generation takes up to 3 hours, so I don't want to let them play too many games.
How can I write a fitness function like this?
Seven generations doesn't seem like nearly enough to get a useful result. Especially for a game, I would expect something like 200+ generations to be more realistic. You could do a number of things:
Implement elitism in order to ensure the survival of the best individual(s).
The strength of evolution stems from repeated mutation and crossover, so I'd recommend letting the agents play only a few games per generation (say, 5 ~ 10), at least at the beginning, and then evolve the population. You might even want to do only one game per generation.
In this regard, you could adopt a continuous evolution strategy. What this means is that as soon as an agent dies, they are subjected to mutation, and as soon as an agent wins, they can produce offspring. Or any combination of the two. The point is that the tournament is ongoing, everyone can play against anyone else. This is a little more "organic" in the sense that it does not have strictly defined generations, but it should speed up the process (especially if you can parallelise the evaluation).
I hope that helps. The accepted answer in the post you referenced has a good suggestion about the way you could implement crossover.