I am reading the following two examples
https://github.com/h2oai/h2o-3/blob/master/h2o-docs/src/product/tutorials/gbm/gbmTuning.ipynb
https://h2o-release.s3.amazonaws.com/h2o/rel-turing/10/docs-website/h2o-docs/grid-search.html
Both of them when set up grid search, it fix the ntree instead of feed a list of ntree for example
[i * 100 for i in range(1, 11)].
Here is my question
I am wondering is that because early stop is set up against the
ntree? For example, we can set up ntree = 1000 and
score_tree_interval = 100, then it can evaluate the model
performance from 100, 200, ... till 1000. Do I understand correctly?
But if my grid search also include learn_rate and max_depth. Will
the early stop also evaluate against learn_rate and max_depth? I
mean within the same number of tree for example ntree = 500, when it
evaluate different learning rate [0.01, 0.015, 0.025, 0.05, 0.1],
will it stop somewhere in the list of learn rate?
In the
document of "stopping_tolerance" (http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/stopping_tolerance.html)
it describes "the model will stop training after reaching three
scoring events in a row in which a model’s missclassication value
does not improve by 1e-3". So what are the three scoring events? are
they 3 different number of tree or they could be the same number of
tree but different learning rate?
As Darren Cook mentioned in the comments there is early stopping for each model you build and early stopping for grid search.
For an individual GBM, ntrees (the number of trees) is tuned with early stopping (i.e. using stopping_tolerance, stopping_rounds, stopping_metric specified within the algorithm). You can see this if you open up flow and take a look at the scoring history plot of your individual model. You will see that number of trees is the x-axis.
For Grid Search you have the added layer of your hyper parameters. So if you set ntrees = 100 in your GBM model, and you grid over learning rate = [0.01, 0.015] you will build two models one with ntrees = 100 and learn rate = 0.01, and a second model with ntrees = 100 and learn rate = 0.015. And, for example, in the first model each iteration will have a different number of trees while the learn rate will be fixed.
So looking at your specific questions:
yes this is correct
the grid will see if there is any improvement between your different learning rates (0.01, 0.015, etc) and max depth. So again what you are saying is correct. The grid will stop if it's not seeing any improvement with different learn rates and max depth values (i.e. it will not continue to build new models).
So here you need to separate out the model and the grid search. An individual model will stop building (adding trees) if it doesn't see improvement after three scoring events (and here your learn_rate and max_depth would be fixed while the ntrees would change). Then we step out to the grid, the grid will stop building new models if it doesn't see a user-specified amount of improvement between the individual models it built.
Related
Using system dynamics on anylogic how can you model a simulation that will give an infectious curve of this nature(Below picture) using SEIR.
enter image description here
I have tried to simulate, however my graph goes up and down. It does not oscillate as per the attached picture.
I need to simulate something similar to the graph for my assingment.
There should be three types of events in your model.
First, lets call it "initial spread", is triggered on the start of your simulation.
Second, lets call it "winter season", is triggered annualy in November\December.
Third, lets call it "mass vaccination" - you can decide when to trigger it and for what selection of your agents.
So first two are kind of global events, and the third event is specific to some sub-population (this can make the third wave kind of "toothy" if you trigger it in slightly different moments for different populations).
That is pretty it.
Curios to see how your model will predict the fourth wave - second winter season of your simulation. So keep us updated :)
There are a number of ways to model this. One of the simplest ways is to simply use a time aspect for one of your infection rate parameters so that the infection rate increases or decreases with time.
See the example below.
I took the SIR model from the Cloud https://cloud.anylogic.com/model/d465d1f5-f1fc-464f-857a-d5517edc2355?mode=SETTINGS
And simply added an event to change the Infectivity rate using an event.
Changing the chart to only show infected people the result now looked something like this.
(See the 3 waves that were created)
You will obviously use a parameters optimization experiment to get the parameter settings as close to reality as possible
I am using parameter variation in AnyLogic (in a system dynamics model). I am interested in how one parameter changes with the various iterations. The parameter is binary: 0 when supply of water is greater than demand and 1 when supply is lower than demand. The parameters being varied are a given percentage of decrease in outdoor irrigation, a given percentage of decrease in indoor water-use, and a given percentage of households that have rainwater harvesting systems. Visually, I need a time plot where on the x-axis is time (10,950 days; i.e. 30 years) and the binary on the y-axis. This should essentially show which iteration pushes a 1 further into the future.
I have watched videos and seen how histograms and 2D data are used to visualize the results of the iterations, but this does not show which iteration produced which output specifically. Is there a way to first, visually show the output as I have described above and second, return the data for a specific iteration?
Many thanks!
Parameter variation experiments have After Iteration and After Simulation run actions that are executed after each iteration and simulation respectively. Here, it is possible to access the values inside the simulation object after it finished but before it is destroyed. There is also a getCurrentIteration() method which can be used to control the parameter variation experiment and retrieve the data.
For more detail please consult here and see "SIR Agent Based Calibration" example model in AnyLogic example models library (Help -> Example Models).
I'm working on a feed forward artificial neural network (ffann) that will take input in form of a simple calculation and return the result (acting as a pocket calculator). The outcome wont be exact.
The artificial network is trained using genetic algorithm on the weights.
Currently my program gets stuck at a local maximum at:
5-6% correct answers, with 1% error margin
30 % correct answers, with 10% error margin
40 % correct answers, with 20% error margin
45 % correct answers, with 30% error margin
60 % correct answers, with 40% error margin
I currently use two different genetic algorithms:
The first is a basic selection, picking two random from my population, naming the one with best fitness the winner, and the other the loser. The loser receives one of the weights from the winner.
The second is mutation, where the loser from the selection receives a slight modification based on the amount of resulting errors. (the fitness is decided by correct answers and incorrect answers).
So if the network outputs a lot of errors, it will receive a big modification, where as if it has many correct answers, we are close to a acceptable goal and the modification will be smaller.
So to the question: What are ways I can prevent my ffann from getting stuck at local maxima?
Should I modify my current genetic algorithm to something more advanced with more variables?
Should I create additional mutation or crossover?
Or Should I maybe try and modify my mutation variables to something bigger/smaller?
This is a big topic so if I missed any information that could be needed, please leave a comment
Edit:
Tweaking the numbers of the mutation to a more suited value has gotten be a better answer rate but far from approved:
10% correct answers, with 1% error margin
33 % correct answers, with 10% error margin
43 % correct answers, with 20% error margin
65 % correct answers, with 30% error margin
73 % correct answers, with 40% error margin
The network is currently a very simple 3 layered structure with 3 inputs, 2 neurons in the only hidden layer, and a single neuron in the output layer.
The activation function used is Tanh, placing values in between -1 and 1.
The selection type crossover is very simple working like the following:
[a1, b1, c1, d1] // Selected as winner due to most correct answers
[a2, b2, c2, d2] // Loser
The loser will end up receiving one of the values from the winner, moving the value straight down since I believe the position in the array (of weights) matters to how it performs.
The mutation is very simple, adding a very small value (currently somewhere between about 0.01 and 0.001) to a random weight in the losers array of weights, with a 50/50 chance of being a negative value.
Here are a few examples of training data:
1, 8, -7 // the -7 represents + (1+8)
3, 7, -3 // -3 represents - (3-7)
7, 7, 3 // 3 represents * (7*7)
3, 8, 7 // 7 represents / (3/8)
Use a niching techniche in the GA. A useful alternative is niching. The score of every solution (some form of quadratic error, I think) is changed in taking account similarity of the entire population. This maintains diversity inside the population and avoid premature convergence an traps into local optimum.
Take a look here:
http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.100.7342
A common problem when using GAs to train ANNs is that the population becomes highly correlated
as training progresses.
You could try increasing mutation chance and/or effect as the error-change decreases.
In English. The population becomes genetically similar due to crossover and fitness selection as a local minim is approached. You can introduce variation by increasing the chance of mutation.
You can do a simple modification to the selection scheme: the population can be viewed as having a 1-dimensional spatial structure - a circle (consider the first and last locations to be adjacent).
The production of an individual for location i is permitted to involve only parents from i's local neighborhood, where the neighborhood is defined as all individuals within distance R of i. Aside from this restriction no changes are made to the genetic system.
It's only one or a few lines of code and it can help to avoid premature convergence.
References:
TRIVIAL GEOGRAPHY IN GENETIC PROGRAMMING (2005) - Lee Spector, Jon Klein
Hello wonderful community!
I'm currently writing a small game in my spare time. It takes place in a large galaxy, where the player has control of some number of Stars. On these stars you can construct Buildings, each of which has some number (0..*) of inputs, and produce some number of outputs. These buildings have a maximum capacity/throughput, and scaling down it's inputs scales down it's outputs by an equal amount. I'd like to find a budgeting algorithm that optimizes (or approximates) the throughput of all the buildings. It seems like some kind of max-flow problem, but none of the flow optimization algorithms I've read have differing types of inputs or dependent outputs.
The toy "tech tree" I've been playing with is:
Solar plant - None => 2 energy output.
Extractor - 1 energy => 1 ore output
Refinery - 1 energy, 1 ore => 1 metal
Shipyard - 1 metal, 2 energy => 1 ship
I'm willing to accept sub-optimal algorithms, and I'm willing to make the guarantee that the inputs/outputs have no cycles (they form a DAG from building to building). The idea is to allow reasonable throughput and tech tree complexity, without player intervention, because on the scale of hundreds or thousands of stars, allowing the player to manually define the budgeting strategy isn't fun and gives players who no-life it a distinct advantage.
My current strategy is to build up a DAG, and give the resources a total ordering (Ships are better than Metal is better than Ore is better than energy), then, looping through each of the resources, find the most "descendant" building which produces that resource, allow it to greedily grab from it's inputs recursively (a shipyard would take 2 energy, and 1 metal, and then the refinery would grab 1 energy and 1 ore, etc), then find any "liars" in the graph (the solar plant is providing 4 energy, when it's maximum is 2), scale down their production and propagate the changes forward. Once everything is resolved for the DAG, remove the terminal element (shipyard) from the graph and subtract the "current thruoghput" of each edge from the maximum throughput of the building, and then repeat the process for the next type of resource. I thought I'd ask people far more intelligent than me if there's a better way. :)
I've run an experiment and would like to fit a state space model to the data. Unfortunately I have little experience with how to implement this, so was hoping to ask for some help.
In the experiment participants reach towards different targets. The participant receives feedback about their movement via an on screen cursor. This cursor displays their reaching movement, but is rotated by 30 degrees. This means participants initially make large errors, but reduce them with repeated practice.
The following data provides some illustrative results. Each value represents an 'epoch' (average of eight trials):
18.26
13.95
10.92
10.32
8.23
6.57
7.05
5.98
5.99
4.58
4.35
3.72
3.71
3.04
4.47
4.16
I have found a paper that has used a similar experiment and has fit a state space model to their data. The model is composed of two equations:
1) e(n) = p(n) - s(n) + E(n) 2) s(N+1) = s(n) + Ae(n)
Where e(n) = error on trial n (i.e. values above)
p(n) = perturbation applied to movement (i.e. 30 degrees)
s(n) = internal state of system
E(n) = noise
A = rate of adaptation to perturbation
The paper indicates that they used the nlinfit matlab function to implement this model, but I don't understand how I would do this. Any help would be greatly appreciated!
I've just seen your post now, ages later, but I've come accross it while looking for a problem of my own.
From experience, I know that if you have a system that you want to obtain a State Space model for, and you have measured inputs and corresponding measured outputs from your system, you can use the 'pem' function that will build you a state space model based on your measurements.
The 'pem' function is part of the system identification toolbox.