Why does my agent always takes a same action in RL? - matlab

0
I'm trying to reproduce the work in the paper Demand Response for Home Energy Management Using Reinforcement Learning and Artificial Neural Network. I want to optimize the power consumption for home appliances. The action space is a different power rating for home appliances. My reward function is = -(power rating *electricity price).
I have trained an RL agent using DQN algorithm on Matlab. I have action space that the agent should select from, but my agent always takes the same action irrespective of state. I have checked my reward function and the algorithm does not select the action with the highest reward. Anyone can think of why is the agent behaving this way?
My code:
enter image description here
enter image description here
What I'm getting while training:
enter image description here
And my agent always takes the same power rating regardless of the state (electricity price). Why?

Related

How can I create a finite calling population model?

I am trying to simulate a finite calling population model in AnyLogic. My population consists of 10 agents and I want them to come back to the Source node after they have been served.
I thought about making conditioning with the SelectOutput node but the Source node does not have any input. The best thing that I came up with is to just limit the number of customers arrivals to 10. However, in this case, the model stops running after 10 arrivals which is not an appropriate result.
What can I do to be able to simulate such a type of model in AnyLogic?
EDIT: I thought that making agents come back to the Source node could be a solution to building the finite calling population model. The main purpose of my question is to understand how can I build such type of model in AnyLogic. Here is the description of the concept of the model.
You cannot send them back to a Source element, as it only acts to create agents.
However, you can send them back to blocks that come after the source as below:
Here, all agents created by the Source block will infinitely loop through the Queue and Delay blocks.

System dynamics SEIR infectious curve for 3 waves of Covid

Using system dynamics on anylogic how can you model a simulation that will give an infectious curve of this nature(Below picture) using SEIR.
enter image description here
I have tried to simulate, however my graph goes up and down. It does not oscillate as per the attached picture.
I need to simulate something similar to the graph for my assingment.
There should be three types of events in your model.
First, lets call it "initial spread", is triggered on the start of your simulation.
Second, lets call it "winter season", is triggered annualy in November\December.
Third, lets call it "mass vaccination" - you can decide when to trigger it and for what selection of your agents.
So first two are kind of global events, and the third event is specific to some sub-population (this can make the third wave kind of "toothy" if you trigger it in slightly different moments for different populations).
That is pretty it.
Curios to see how your model will predict the fourth wave - second winter season of your simulation. So keep us updated :)
There are a number of ways to model this. One of the simplest ways is to simply use a time aspect for one of your infection rate parameters so that the infection rate increases or decreases with time.
See the example below.
I took the SIR model from the Cloud https://cloud.anylogic.com/model/d465d1f5-f1fc-464f-857a-d5517edc2355?mode=SETTINGS
And simply added an event to change the Infectivity rate using an event.
Changing the chart to only show infected people the result now looked something like this.
(See the 3 waves that were created)
You will obviously use a parameters optimization experiment to get the parameter settings as close to reality as possible

Policy network for the game 2048

I'm trying to implement a policy network agent for the game 2048 according to Karpathy's RL tutorial. I know the algorithm will need to play some batch of games, remember the inputs and actions taken, normalize and mean center the ending scores. However, I got stuck at the design of the loss function. How to correctly encourage actions that lead to the better final scores and discourage those that lead to worse scores?
When using softmax at the output layer, I devised something along this:
loss = sum((action - net_output) * reward)
where action is in one hot format. However, this loss doesn't seem to do much, the network doesn't learn. My full code (without the game environment) in PyTorch is here.
For the policy network in your code, I think you want something like this:
loss = -(log(action_probability) * reward)
Where action_probability is your network's output for the action performed in that timestep.
For example, if your network outputted a 10% chance of taking that action, but it provided a reward of 10, your loss would be: -(log(0.1) * 10) which is equal to 10.
But, if your network already thought that was a good move and outputted a 90% chance of taking that action you would have -log(0.9) * 10) which is roughly equal to 0.45, affecting the network less.
It's worth noting that PyTorch's log function isn't numerically stable and you might be better off using logsoftmax in the final layer of your network.

Alert in RAM/CPU Usage Detection in e-Commerce Server

Currently I'm building my monitoring services for my e-commerce Server, which mostly focus on CPU/RAM usage. It's likely Anomaly Detection on Timeseries data.
My approach is building LSTM Neural Network to predict next CPU/RAM value on chart trending and compare with STD (standard deviation) value multiply with some number (currently is 10)
But in real life conditions, it depends on many differents conditions, such as:
1- Maintainance Time (in this time "anomaly" is not "anomaly")
2- Sales time in day-off events, holidays, etc., RAM/CPU usages increase is normal, of courses
3- If percentages of CPU/RAM decrement are the same over 3 observations: 5 mins, 10 mins & 15 mins -> Anomaly. But if 5 mins decreased 50%, but 10 mins it didn't decrease too much (-5% ~ +5%) -> Not an "anomaly".
Currently I detect anomaly on formular likes this:
isAlert = (Diff5m >= 10 && Diff10m >= 15 && Diff30m >= 40)
where Diff is Different Percentage in Absolute value.
Unfortunately I don't save my "pure" data for building neural network, for example, when it detects anomaly, I modified that it is not an anomaly anymore.
I would like to add some attributes to my input for model, such as isMaintenance, isPromotion, isHoliday, etc. but sometimes it leads to overfitting.
I also want to my NN can adjust baseline over the time, for example, when my Service is more popular, etc.
There are any hints on these aims?
Thanks
I would say that an anomaly is an unusual outcome, i.e. a outcome that's not expected given the inputs. As you've figured out, there are a few variables that are expected to influence CPU and RAM usage. So why not feed those to the network? That's the whole point of Machine Learning. Your network will make a prediction of CPU usage, taking into account the sales volume, whether there is (or was) a maintenance window, etc.
Note that you probably don't need an isPromotion input if you include actual sales volumes. The former is a discrete input, and only captures a fraction of the information present in the totalSales input
Machine Learning definitely needs data. If you threw that away, you'll have to restart capturing it. As for adjusting the baseline, you can achieve that by overweighting recent input data.

create new event in output adapter streaminsight

I have the following problem in StreamInsight. I have a query where new tasks from an order came in and trigger an output adapter to make an prediction. The outputadapter writes the predicted task cycle time to a table (in Windows Azure). The prediction is based on neural networks and is plugged in in the outputadapter. After the prediction is written in the table I want to do something else with all the predicted times. So in a second query I want to count the number of written tasks in a time window of 5 minutes. When the number of predicted values saved in the table is equal to the number of tasks in an order, I want to get all the predicted values from the table and make a prediction of the order cycle time.
For this idea I need to make a new event in my outputadapter to know the predicted time is writen in the table. But I don't thinks its possible to enqueue new events in the streaminsight server from an outputadapter.
Maybe this figure makes the problem clear:
http://i40.tinypic.com/4h4850.jpg
Hope someone can help me.
Thanks Carlo
First off, I'm assuming you are using pre-2.1 StreamInsight based on your use of the term "output adapter".
From what you've posted, I would strongly recommend that your adapters do either input or output, but not both. This cuts down on the complexity, makes the implementation easier, and depending on how you wrote the adapter, you now have a reusable piece of code in your solution.
If you are wanting to send data from StreamInsight to your neural network prediction engine, you will need to write an output adapter to do that. Then I would create an input adapter that will get the results from the neural network prediction engine and enqueue the data into StreamInsight. After creating your stream from the neural network prediction engine input adapter, you can use dynamic query composition to share the stream to a Windows Azure storage output adapter and your next query.
If your neural network prediction engine can "push" data to your input adapter, that would be the way to do. If not, you'll have to poll for results.
There is a lot more to this, but it's difficult to drill in to more specifics without more details.
Hope this helps.