Setting up openai gym - neural-network

I've been given a task to set up an openai toy gym which can only be solved by an agent with memory. I've been given an example with two doors, and at time t = 0 I'm shown either 1 or -1. At t = 1 I can move to correct door and open it.
Does anyone know how I would go about starting out? I want to show that a2c or ppo can solve this using an lstm policy. How do I go about setting up environment, etc?

To create a new environment in gym format, it should have the 5 functions mentioned in the gym.core file.
https://github.com/openai/gym/blob/e689f93a425d97489e590bba0a7d4518de0dcc03/gym/core.py#L11-L35
To lay this down in steps-
Define observation space and action space for your environment, preferably using gym.spaces module.
Write down the step function which performs agent's action and returns a 4 tuple containing - next set of observations from the environment , reward ,
done - a boolean indicating whether the episode is over , and some extra info if you want.
Write a reset function for the environment to reinitialise the episode to a random start state and return a 4 tuple similar to step.
These functions are enough to be able to run an RL agent on your environment.
You can skip the render, seed and close functions if you want.
For the task you have defined,you can model the observation and action space using Discrete(2). 0 for first door and 1 for second door.
Reset would return in it's observation which door has the reward.
Then agent would choose either of the door - 0 or 1.
Then perform a environment step by calling step(action), which will return agent's reward and done flag as true - signifying that the episode is over.
Frankly, the problem you describe seems too simple to accomplish for any reinforcement learning algorithm, but I assume you have provided that as an example.
Remembering for longer horizons is usually harder.
You can read their documentation and toy environments to understand how to create one.

Related

In Anylogic, is it possible to send an agent from one storage to another directly?

I have 2 storages (called storageA & storageB) and I want to move an agent (pallet) from one to the other via forklifts. I have set up the following.
A pallet is created at a node and is moved to storageA via 'store'. This part works fine. The pallet is then moved to storageB via 'store1' after a delay. This is when the following error occurs:
Exception during discrete event execution:
root.store1.seizeTrans.freeSpaceSendTo:
Path not found! {agent=2, source={level=level, pos=(1673.3333333333333, 3245.0, 0.0)}, target={level=level, pos=(1857.25, 3160.4845, 0.0)}}
It works if I replace 'store1' with a retrieve block and send it to a node first. However I would like to send the pallet directly to another storage rather than via another location. Is this possible?
Please let me know if I have not provided enough information.
Thanks
yeah unfortunately you can't do that as far as I know, the solution I use is the following, which is actually not a super robust solution... but has been ok in applications so far
Place a retrieve block between your delay and your store1
Use the agent you pick up as destination:
on the on seize action of the retrieve block do:agent.transporter=unit;
4.On the store1 block put the highest priority for the task
5. ON the store1 block use resource custom transporter choice: agent.transporter.equals(unit)
6. The dispatching policy should be nearest to the agent in store1, but doing all the above ensures that the resource continues doing the task no matter what... by only using the dispatch policy your model will work 99.999999% of the time... the problem occurs only if another task with higher priority occurs at the exact same time as the transporter is released in the retrieve block, which is rare, but can happen.
I had the same question today so I landed here. But luckily, only after the second step written above, the whole process needed did already work for my case. We can move an agent from one storage to another by simply set the destination of the 'retrieve' block to the coordinate of the agent and the move to independently instead of by fleets or resources. after that we put the 'store' block.
Destination is: (x,y,z)
X: agent.getX()
Y: agent.getY()
Z: agent.getZ()
after agents being retrieved to a specified coordinate, it seems that fleets do not comply paths in the network anymore

Anylogic - Assembler should stop working for 2 hours after 10 assemblies done

The "Assembler" should stop working for 2 hours after 10 assemblies are done.
How can I achieve that?
There are so many ways to do this depending on what it means to stop working and what the implications are for the incoming parts.. but here's one option
create a resourcePool called Machine, this will be used along with the technicians:
on the "on exit" action of the assembler do this (I use 9 instead of 10 because the out.count() doesn't count until the agent is completely out, so when it counts 9, it means that you have produced 10)
if(self.out.count()==9){
machine.set_capacity(0);
create_MyDynamicEvent(2, HOUR);
}
In your dynamice event (that you have to create) you will add the following code:
machine.set_capacity(1);
A second option is to have a variable countAssembler count the number of items produced... then
on exit you write countAssembler++;
on enter delay you write the following:
if(countAssembler==10){
self.suspend(agent);
create_MyDynamicEvent(2, HOUR,agent);
}
on the dynamic event you write:
assembler.resume(agent);
Don't forget to add the parameter needed in the dynamic event:
Create a variable called countAssembler of type int. Increment this as agents pass through the assembler. Also create a variable called assemblerStopTime. You also record the assembler stop time with assemblerStopTime=time()
Place a selectOutputOut block before the and let them in if countAssembler value is less than 10. Otherwise send to a Wait block.
Now, to maintain the FIFO rule, in the first selectOutputOut condition, you need to check also if there is any agent in the wait block and if the current time - assemblerStopTime is greater than 2. If there is, you free it and send to the assembler with wait.free(0) function. And send the current agent to wait. You also need to reset the countAssembler to zero.

is it possible to copy the current state of the statechart from one agent to another?

I am trying to make a model to simulate the contagion of covid in public spaces using a mix between SEIR and pedestrian models.
In another question I asked to use a static population. They suggested that before deleting the agent a copy be saved in a list and after the first X agents have been generated I want the next agent generated by the pedSource to be one of the list.
Currently what I do is take a random agent from the list and if it is infected I send a message to the new agent so that it goes into the infected state. But by doing that I am resetting the timeout to recover every time an agent enters the zone that I am modeling.
this is the code that currently runs in the pedSource on exit:
if (personasEnCasa.size()+personasEnSuper.size() > poblacionMaxima){
Persona p = randomFrom(personasEnCasa);
if (p.statechart.getState() == Persona.Infeccioso){
send("Contagiado", ped);
};
personasEnCasa.remove(p);
};
personasEnSuper is my population of Persona, personasEnCasa is my list of agents outside the zone and and poblacionMaxima is the maximum number of agents in the lista and the population
I would like to be able to copy the current statechart of the agent in the list to the agent that generates my pedSource. Or use something similar to a pedSource.inject () but inserting an agent from the list instead of a new one. But I did not know how to do it.
is there any way to do this?
your ped already exists and you don't need to copy it you can just move it to the flow like this, with pedWait being any pedestrian block that you want, so instead of send("Contagiado", ped); you would do enter.take(ped);
but if you insist in using the send, then you can use branches on your statechart to define where this ped goes:
you will need in this case before the send, use ped.infectious=true; and the condition in the branch would be infectious==true to move to the infectious state.
As a side note, instead of p.statechart.getState() == Persona.Infeccioso you should use p.statechart.getState().equals(Persona.Infeccioso)
use == only with primitives such as boolean, int and double, otherwise you are susceptible to errors that are very difficult to discover

difficult to find the current location of agents in Anylogic simulation

i built a simple model for pedestrian movement from start line towards target line, I want to find the number of moving agents in some area using the XY-coordinates (from X=150 to X=350, Y is the same )
The action for the event is to get the count of agents in that area and set the value for the variable crowd1:
crowd1=count(agents(), p-> p.getX()>150 && p.getX()<350)
the problem is that it's always 0 , even though the gents are moving in the simulation.
There are no agents in your environment because you haven't created any agent type... For your code to work you need to have a population of pedestrians registered in your environment (meaning that you have to create the agent type and add it to main as a populatin), and then you have to add to a custom population the agents created in pedSource...
Otherwise, you can use this code:
count(pedGoTo.getPeds(),p->p.getX()>150 && p.getX()<350)

Anylogic: How to conditionally close/open a valve

I am very new at Anylogic. I have a simple model, using the Fluid dynamics library: two tanks and a valve between them. The valve have to open at a rate, say X, only when the amount in the first tank, say tank_1, were twice of the amount of the second tank, say tank_2
Could you please help me with that?
Regards
You probably have more conditions on what to use with the valve depending on different things. But it's a bad idea to make something occur when then tank_1 is exactly 2 times larger than tank_2... Instead, create a boolean variable that tells you if the current situation is that tank_1 is over or below 2*tank_2. Let's call it "belowTank2". I will assume tank_1 is below 2*tank_2 in the beginning of the simulation. so belowTank2 is true.
Then you create an event that runs cyclically every second or even more often if you want and you use the following code:
if(belowTank2){
if(tank_1.amount()>2*tank_2.amount()){
valve.set_openRate(0.1);
belowTank2=false;
}
}else{
if(tank_1.amount()<2*tank_2.amount()){
valve.set_openRate(0.3);
belowTank2=true;
}
}
So this means that whenever tank_1 surpases 2*tank_2, it will trigger the rate change on the valve. And it will trigger again a change when it's below 2*tank_2