QLearning usage on a repetitive simulation

QLearning usage on a repetitive simulation - simulation

I am using Q-Learning algorithm on a simulation. this simulation has limited iterations (600 to 700). the learning process is activated for several runs of this simulation (100 run).
I am new to reinforcement learning, and i have an issue here about how to use exploration/exploitation on such kind of simulation (I am using e-greedy exploration).
I am using a decreasing exploration and I am wondering if I should use the decreasing exploration on the whole simulation runs, or decrease it for each simulation run (initiate epsilon to 0.9 for each simulation run and then decrease it).
Thank you

You won’t need such a high initiation of the epsilon. It might be better to initialize the q-values as very high, so that unknown q-values are always picked above q-values that has been explored at least once.
Considering your state space, it doesn’t matter whether you decrease it after a whole run or an individual run, but individually sounds like a better option.
How fast you decrease it will also depend on the circumstances of the world and how fast the agent learns. I’m trying to make my alpha and epsilon correlate to the error, but it’s tricky to do that.

Related

Is running more epochs really a direct cause of overfitting?

I've seen some comments in online articles/tutorials or Stack Overflow questions which suggest that increasing number of epochs can result in overfitting. But my intuition tells me that there should be no direct relationship at all between number of epochs and overfitting. So I'm looking for answer which explains if I'm right or wrong (or whatever's in between).
Here's my reasoning though. To overfit, you need to have enough free parameters (I think this is called "capacity" in neural networks) in your model to generate a function which can replicate the sample data points. If you don't have enough free parameters, you'll never overfit. You might just underfit.
So really, if you don't have too many free parameters, you could run infinite epochs and never overfit. If you have too many free parameters, then yes, the more epochs you have the more likely it is that you get to a place where you're overfitting. But that's just because running more epochs revealed the root cause: too many free parameters. The real loss function doesn't care about how many epochs you run. It existed the moment you defined your model structure, before you ever even tried to do gradient descent on it.
In fact, I'd venture as far as to say: assuming you have the computational resources and time, you should always aim to run as many epochs as possible, because that will tell you whether your model is prone to overfitting. Your best model will be the one that provides great training and validation accuracy, no matter how many epochs you run it for.
EDIT
While reading more into this, I realise I forgot to take into account that you can arbitrarily vary the sample size as well. Given a fixed model, a smaller sample size is more prone to being overfit. And then that kind of makes me doubt my intuition above. Still happy to get an answer though!

Your intuition to me seems completely correct.
But here is the caveat. The whole purpose of deep models is that they are "deep" (duh!!). So what happens is that your feature space gets exponentially larger as you grow your network.
Here is an example to compare a deep model with a simpler mode:
Assume you have a 10-variable data set. With a crazy amount of feature engineering, you might be able to extract 50 features out of it. Then if you run a traditional model (let's say a logistic regression), you will have 50 parameters (capacity in your word, or degree of freedom) to train.
But, if you use a very simple deep model with Layer 1: 10 unit, layer2: 10 units, layer3: 5 units, layer4: 2 units, you will end up with (10*10 + 10*10 + 5*2 = 210) parameters to train.
Therefore, usually when we train a neural net for a long time, we end of with a memorized version of our data set(this gets worse if our data set is small and easy to be memorized).
But as you also mentioned, there is no intrinsic reason why higher number of epochs result in overfitting. Early stopping is usually a very good way for avoiding this. Just set patience equal to 5-10 epochs.

If the amount of trainable parameters is small with respect to the size of your training set (and your training set is reasonably diverse) then running over the same data multiple times will not be that significant, since you will be learning some features about your problem, rather than just memorizing the training data set. The problem arises when the amount of parameters is comparable to your training data set size (or bigger), it is basically the same problem as with any machine learning technique that uses too many features. This is quite common if you use large layers with dense connections. To combat this overfitting problem there are lots of regularization techniques (dropout, L1 regularizer, constraining certain connections to be 0 or equal such as in CNN).
The problem is that might still be left with too many trainable parameters. A simple way to regularize even further is to have a small learning rate (i.e. don't learn too much from this particular example lest you memorize it) combined with monitoring the epochs (if there is a large gap increase between validation/training accuracy, you are starting to overfit your model). You can then use the gap info to stop your training. This is a version of what is known as early stopping (stop before you reach the minimum in your loss function).

Why learning rate needs to be decreased when the loss remains constant for epochs?

Any intuitive explanation on why learning rate needs to be decreased when the loss remains constant for epochs?. Will this method not make the network get stuck in local minima or a plateau?

What helped me understanding it was to assume that my loss function depends only on a single feature. Therefore, it could be something like a parabola. Now imagine that you are on the branch on the left side of the minimum. If you choose the learning rate too big, this could cause you to step over the minimum and you end up on the right branch. If you repeat this, you keep alternating between points on both branches without reaching the minimum. BUT: If you decrease your learning rate now, you slowly get closer the real minimum.
That means: if your cost remains relatively constant over some time, it could be that you step over a local minimum. Therefore you can try to decrease your steps.
And yes: you generally only find local minima with most methods. As you normally have multivariate cost functions, you randomly start somewhere and you try to find some kind of minimum. As you normally do several runs, it is likely that you end up with different local minima.

Must accuracy increase after every epoch?

When training a neural neural network using batches, should accuracy (training and validation) increase after every epoch (after seeing the whole data an additional time)?
I want to be able to quickly judge if the network settings (learning rate, number of nodes.. etc) is reasonable. It also seemed necessary that the more the whole dataset is seen, the better the performance should be.
So, if the performance decreases at an epoch, should I be worried that something is wrong (high learning rate, high bias)? (Or do I always have to wait several epochs to judge?)

I would say it depends on dataset and architecture. Hence, fluctuations are normal, but in general loss should improve. You can have a look at these practical guides to better interpret loss curves:
http://cs231n.github.io/neural-networks-3/#loss
https://blog.slavv.com/37-reasons-why-your-neural-network-is-not-working-4020854bd607

Yes, in a perfect world one would expect the test accuracy to increase. If the test accuracy starts to decrease it might be that your network is overfitting. You might want to stop the learning just before you reach that point or take other steps to counter the overfitting problem.
Also, it could be a result of noise in the test dataset, i.e. wrongly labeled examples.

When to stop CNN learning

In tensorflow, I used to execute cnn learning for fixed number of epochs and save checkpoints in between after specified number of epochs interval. For evaluating the model, the checkpoints are restored and perform prediction on the validation dataset.
I want to automate the learning process, instead of using fixed epochs. Please explain how the loss value over mini batches can be utilised for determining the stopping point? Also please help me towards implementing learning rate decay in tensorflow. Which is better constant decay or exponential and how to determine the decay factor?

First for the number of iterations you can exit the training if your loss stopped improving on the batch i.e. if the difference between two loss values AVERAGED accross batches (to reduce batch fluctuations) is less than a determined threshold.
But you probably realized that the threshold is an hyperparameter too !
In fact there are quite a few attempts to completely automate ML but no matter what you do you still end up with some hyperparameters.
Secondly for the decay factor it is used when you feel the loss has stopped improving and think that you are in a local minima and oscillating in and out of the well without actually going in (this metaphore only works when you have 2 dimensions but I find it usefull still).
Almost every time it is done in the litterature it looks very hand-made: like you train for 200 epochs you see that it reached a plateau so you decrease your lr with a step function (argument staircase=True in TF) and then again.
What is commonly used is to divide the learning rate by 10 (exponential decay) but like before it is very arbitrary !
For details on how to implement learning rate decay in TF you can see dga's answer in this SO question.
It is pretty straightforward !
What can help with the schedule and the values you use is cross-validation but oftentimes you can simply look at your loss and do it by hands.
There is no silver bullet in deep learning it is just trials and errors.

How to change the default parameters for newfit() in MATLAB?

I am using
net = newfit(in,out,lag(j),{'tansig','tansig'});
to generate a new neural network. The default value of the number of validation checks is 6.
I am training a lot of networks and this is taking a lot of time. I guess it doesn't matter if my results are a bit less accurate if they can be made considerably faster.
How can I train faster?
I believe one of the ways might be to reduce the value of the number of validation checks. How can I do that (in code, not using GUI)
Is there some other way to increase speed.
As I said, the increase in speed may be at a little loss of accuracy.

Just to extend #mtrw answer, according to the documentation, training stops when any of these conditions occurs:
The maximum number of epochs is reached: net.trainParam.epochs
The maximum amount of time is exceeded: net.trainParam.time
Performance is minimized to the goal: net.trainParam.goal
The performance gradient falls below min_grad: net.trainParam.min_grad
mu exceeds mu_max: net.trainParam.mu_max
Validation performance has increased more than max_fail times since
the last time it decreased (when using validation): net.trainParam.max_fail
Epochs and time contraints allows to put an upper bound on the training duration.
Goal constraint stop the training when the performance (error) drops below it, and usually allows you to adjust the level of time/accuracy trade-off: less accurate results for faster execution.
This is similar to min_grad (gradient tells you the strength of the "descent") in that if the magnitude of the gradient is less than mingrad, training stops. It can be understood by the fact that if the error function is not changing by much, then we are reaching a plateau and we should probably stop training since we are not going to improve by much.
mu, mu_dec, and mu_max are used to control the weight updating process (backpropagation).
max_fail is usually used to avoid over-fitting, not so much for speedup.
My advice, set time and epochs to the maximum possible that your application constraints allow (otherwise the results will be poor). And in turn, you can control goal and min_grad to reach the level of speed/accuracy trade-off desired. Keep in mind that max_fails wont make you gain any time, since its mainly used to assure good generalization power.

(Disclaimer: I don't have the neural network toolbox, so I'm only extrapolating from the Mathworks documentation)
It looks from your input parameters like you're using TRAINLM. According to the documentation, you can set the net.trainParam.max_fail parameter to change the validation checks.
You can set the initial mu value, as well as the increment and decrement factors. But this would require some insight into the expected answer and performance of the search.
For a more blunt approach, you can also control the maximum number of iterations by setting the net.trainParam.epochs parameter to something less than its default 100. You might also set the net.trainParam.time parameter to limit the number of seconds.
You should probably set net.trainParam.show to NaN to skip any displays.

Neural nets are treated as objects in MATLAB. To access any parameter before (or after) training, you need to access the network's properties using the . operator.
In addition to mtrw's and Amro's answers, make MATLAB's Neural Network Toolbox documentation your new best friend. It will usually explain things in much better detail.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse