what is the maximum number of patches - netlogo

I am messing about with GIS and what not. So I want to have as many patches as I can alas when I go too large 3600 x 1800 I end up with a crash and a verbose error starting with A InvocationTargetException.
Is it system based or in the network code?
How big can I go?

This is answered in the NetLogo FAQ at http://ccl.northwestern.edu/netlogo/docs/faq.html#howbig :
The NetLogo engine has no fixed limits on size [...]
In practice, the main limit is memory. Are you sure you need that high of patch resolution?
In cases where your model has turtles sampling raster data, you can use high resolution GIS raster datasets while keeping patch resolution low and then have turtles sample the dataset on the fly instead of importing into patch variables. Look closely at how the GIS Gradient Example is written in ModelLibrary/ CodeExamples/ GIS.

Related

In what order should we tune hyperparameters in Neural Networks?

I have a quite simple ANN using Tensorflow and AdamOptimizer for a regression problem and I am now at the point to tune all the hyperparameters.
For now, I saw many different hyperparameters that I have to tune :
Learning rate : initial learning rate, learning rate decay
The AdamOptimizer needs 4 arguments (learning-rate, beta1, beta2, epsilon) so we need to tune them - at least epsilon
batch-size
nb of iterations
Lambda L2-regularization parameter
Number of neurons, number of layers
what kind of activation function for the hidden layers, for the output layer
dropout parameter
I have 2 questions :
1) Do you see any other hyperparameter I might have forgotten ?
2) For now, my tuning is quite "manual" and I am not sure I am not doing everything in a proper way.
Is there a special order to tune the parameters ? E.g learning rate first, then batch size, then ...
I am not sure that all these parameters are independent - in fact, I am quite sure that some of them are not. Which ones are clearly independent and which ones are clearly not independent ? Should we then tune them together ?
Is there any paper or article which talks about properly tuning all the parameters in a special order ?
EDIT :
Here are the graphs I got for different initial learning rates, batch sizes and regularization parameters. The purple curve is completely weird for me... Because the cost decreases like way slowly that the others, but it got stuck at a lower accuracy rate. Is it possible that the model is stuck in a local minimum ?
Accuracy
Cost
For the learning rate, I used the decay :
LR(t) = LRI/sqrt(epoch)
Thanks for your help !
Paul
My general order is:
Batch size, as it will largely affect the training time of future experiments.
Architecture of the network:
Number of neurons in the network
Number of layers
Rest (dropout, L2 reg, etc.)
Dependencies:
I'd assume that the optimal values of
learning rate and batch size
learning rate and number of neurons
number of neurons and number of layers
strongly depend on each other. I am not an expert on that field though.
As for your hyperparameters:
For the Adam optimizer: "Recommended values in the paper are eps = 1e-8, beta1 = 0.9, beta2 = 0.999." (source)
For the learning rate with Adam and RMSProp, I found values around 0.001 to be optimal for most problems.
As an alternative to Adam, you can also use RMSProp, which reduces the memory footprint by up to 33%. See this answer for more details.
You could also tune the initial weight values (see All you need is a good init). Although, the Xavier initializer seems to be a good way to prevent having to tune the weight inits.
I don't tune the number of iterations / epochs as a hyperparameter. I train the net until its validation error converges. However, I give each run a time budget.
Get Tensorboard running. Plot the error there. You'll need to create subdirectories in the path where TB looks for the data to plot. I do that subdir creation in the script. So I change a parameter in the script, give the trial a name there, run it, and plot all the trials in the same chart. You'll very soon get a feel for the most effective settings for your graph and data.
For parameters that are less important you can probably just pick a reasonable value and stick with it.
Like you said, the optimal values of these parameters all depend on each other. The easiest thing to do is to define a reasonable range of values for each hyperparameter. Then randomly sample a parameter from each range and train a model with that setting. Repeat this a bunch of times and then pick the best model. If you are lucky you will be able to analyze which hyperparameter settings worked best and make some conclusions from that.
I don't know any tool specific for tensorflow, but the best strategy is to first start with the basic hyperparameters such as learning rate of 0.01, 0.001, weight_decay of 0.005, 0.0005. And then tune them. Doing it manually will take a lot of time, if you are using caffe, following is the best option that will take the hyperparameters from a set of input values and will give you the best set.
https://github.com/kuz/caffe-with-spearmint
for more information, you can follow this tutorial as well:
http://fastml.com/optimizing-hyperparams-with-hyperopt/
For number of layers, What I suggest you to do is first make smaller network and increase the data, and after you have sufficient data, increase the model complexity.
Before you begin:
Set batch size to maximal (or maximal power of 2) that works on your hardware. Simply increase it until you get a CUDA error (or system RAM usage > 90%).
Set regularizes to low values.
The architecture and exact numbers of neurons and layers - use known architectures as inspirations and adjust them to your specific performance requirements: more layers and neurons -> possibly a stronger, but slower model.
Then, if you want to do it one by one, I would go like this:
Tune learning rate in a wide range.
Tune other parameters of the optimizer.
Tune regularizes (dropout, L2 etc).
Fine tune learning rate - it's the most important hyper-parameter.

Accuracy in Caffe keeps on 0.1 and does not change

Through all training process, accuracy is 0.1. What am I doing wrong?
Model, solver and part of log here:
https://gist.github.com/yutkin/3a147ebbb9b293697010
Topology in png format:
P.S. I am using the latest version of Caffe and g2.2xlarge instance on AWS.
You're working on CIFAR-10 dataset which has 10 classes. When the training of a network commences, the first guess is usually random due to which your accuracy is 1/N, where N is the number of classes. In your case it is 1/10, i.e., 0.1. If your accuracy stays the same over time it implies that your network isn't learning anything. This may happen due to a large learning rate. The basic idea of training a network is that you calculate the loss and propagate it back. The gradients are multiplied with the learning rate and added to the current weights and biases. If the learning rate is too big you may overshoot the local minima every time. If it is too small, the convergence will be slow. I see that your base_lr here is 0.01. As far as my experience goes, this is somewhat large. You may want to keep it at 0.001 in the beginning and then go on reducing it by a factor of 10 whenever you observe that the accuracy is not improving. But then anything below 0.00001 usually doesn't make much of a difference. The trick is to observe the progress of the training and make parameter changes as and when required.
I know the thread is quite old but maybe my answer helps somebody. I experienced the same problem with an accuracy like a random guess.
What helped was to set the number of outputs of the last layer before the accuracy layer to the number of labels.
In your case that should be the ip2 layer. Open the model definition of your net and set num_outputs to the number of labels.
See Section 4.4 for more information: A Practical Introduction to Deep Learning with Caffe and Python

How many agents can NetLogo supports in a model?

Suppose in a model the NetLogo world is 160 X 101. So, In such a world how many stationary agents and moving agents can possibly be created. Can this world will able to support 100000 moving agents (no die) or possible support by NetLogo to moving agents in a single model.
No formal restrictions, just performance and resources. See How to model a very large world in NetLogo? I have personally modelled with 50,000 agents and it's still reasonable speed. However, get your model working with a MUCH smaller size before expanding as it will slow down.

How to model a very large world in NetLogo?

I need to create a very large grid of patches to have GIS information of a very large network (such as a city-wide network). My question is how to get NetLogo to model such a world? When I set the max-pxcor and max-pycor to large numbers, it stop working. I need a world of for example size 50000 * 50000.
Thanks for your help.
See http://ccl.northwestern.edu/netlogo/docs/faq.html#howbig , which says in part: β€œThe NetLogo engine has no fixed limits on size...”
It's highly unlikely that you'll be able to fit a 50,000 x 50,000 world, in your computer though β€” that's 2.5 billion patches. Memory usage in NetLogo is proportional to the number of agents, and patches are agents too.
You might take Stephin Guerin's advice at http://netlogo-users.18673.x6.nabble.com/Re-Rumors-of-Relogo-tp4869241p4869247.html on how to avoid needing an enormous patch grid when modeling transportation networks.

MATLAB - Curves of Pursuit (Predator/Prey)

In my engineering class we are programming a "non-trivial" predator/prey pursuit problem.
Here's the gist of the situation:
There is a prey that is trying to escape a predator. Each can be modeled as a particle that can be animated in MATLAB (we have to use this coding language).
The prey:
can maneuver (turn) easier than the predator can
The predator:
can move faster than the prey
I have to create code for both the predator and the prey, which will be used in a class competition.
This is basically what the final product will look like:
http://www.brown.edu/Departments/Engineering/Courses/En4/Projects/pred_prey.gif
The goal is to catch the other team's prey in the shortest amount of time, and for my prey to become un-catchable for the other team's predator (or at least escape for a long period of time).
Here are the specific design constraints:
3. Design Constraints:
Predator and prey can only move in the x-y plane
Simulations will run for a time period of 250 seconds.
Both predator and prey will be subjected to three forces: (a) The propulsive force; (b) a viscous drag
force; and (c) a random time-varying force. (all equations given)
1. The propulsive forces will be determined by functions provided by the two competing groups
The predator is assumed to catch the prey if the distance between predator and prey drops below 1m.
You may not use the rand() function in computing your predator/prey forces – the only random forces
should be those generated by the script provided. (EOM with random forces are impossible for the
ODE solver to integrate, and it ends up in an infinite loop).
For the competition, we will provide the MATLAB code that will compute and animate the trajectories of
the competitors, and will determine the winner of each contest. The test code will be working in SI units.
I am looking for any resources that may be able to help me with some strategy. I have looked at basic pursuit curves, but I would love to look at some examples where the prey is not moving in a straight line. Any other coding advice or strategies would be greatly appreciated!
It's a good idea to start with the fundamentals in any field, and you can't go past the work of Issacs (Differential Games: A mathematical theory with applications to warfare and pursuit, control and optimization). This will almost certainly end up being a reference in any academic research project you may end up writing up.
Steven Lavalle's excellent book Motion Planning has a number of aspects that may be of interest including a section on visibility based pursuit evasion.
As for many mathematical topics, Wolfram Mathworld has some good diagrams and links that might get you thinking in the right direction (eg Pursuit Curves).
If you want to have a look at a curious problem in the area that is well understood try the Homicidal chauffeur problem - this will at least give you some grounds for comparing complexity / efficiency of different techniques. In particular, this is probably a good way to get a feel for level set methods (the paper Homicidal Chaueur Game. Computation of Level Sets of the Value Function by Patsko and Turova appears to have a number of images that might be helpful)