Running NetLogo on HPC machine: how to specify the number of cores to be used? - netlogo

$ wget
$ tar -xzf netlogo-5.1.0.tar.gz
$ ~/netlogo-5.1.0/ \
--model ~/myproject/MyModel.nlogo \
--experiment MyExperiment \
--table ~/myproject/MyNewOutputData.csv
Using the above commands to run a netlogo headless on HPC machine. The problem is how to I specify the number of cores to be used or does by default take the maximum avialable?

A look at reveals:
--threads <number>: use this many threads to do model runs in parallel, or 1 to disable parallel runs. defaults to one thread per processor.
This is equivalent to the same setting in the BehaviorSpace GUI.


Pausing an os.system command over a slurm scheduler if the first command doesn't allocate

I am running a script multiple times over a multinode cluster, and this script processes data sequentially over the cluster. Here is the code:
import os
os.system("srun -p rs2 --mem-per-cpu 200G -t 7-23:00:00 python3 /home/usr/Sim/ aok; srun -p rs2 python3")
The issue is that if the first statement i.e.
srun -p rs2 --mem-per-cpu 200G -t 7-23:00:00 python3 /home/usr/Sim/ aok
needs to wait for an allocation of resources, then the second statement immediately executes. The second command I have relies on the first command being fully executed. Is there a way to make the second statement wait until the first statement allocates and fully finishes?
I would suggest using subprocess.Popen instead of os.system. This will let you use .wait(), where you can check the status of a command for completion.

How to get output of gcloud composer command?

I'm executing gcloud composer commands:
gcloud composer environments run airflow-composer \
--location europe-west1 --user-output-enabled=true \
backfill -- -s 20171201 -e 20171208 dags.my_dag_name \
kubeconfig entry generated for europe-west1-airflow-compos-007-gke.
It's a regular airflow backfill. The command above is printing the results at the end of the whole backfill range, is there any way to get the output in a streaming manner ? Each time a DAG gets backfilled it will be printed in the standard output, like in a regular airflow-cli.

Forcing LSF to execute jobs on different hosts

I have a setup consisting from 3 workers and a management node, which I use for submitting tasks. I would like to execute concurrently a setup script at all workers:
bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" mpirun
As far as I understand, I could use 'ptile' resource constraint to force execution at all workers:
bsub -q queue -n 3 -m 'h0 h1 h2' -J "%J_%I" -R 'span[ptile=1]' mpirun
However, occasionally I face an issue that my script got executed several times at the same worker.
Is it expected behavior? Or there is a bug in my setup? Is there a better way for enforcing multi worker execution?
Your understanding of span[ptile=1] is correct. LSF will only use 1 core per host for your job. If there aren't enough hosts based on the -n then the job will pend until something frees up.
However, occasionally I face an issue that my script got executed
several times at the same worker.
I suspect that its something with your script. e.g., LSF appends to the stdout file by default. Use -oo to overwrite.

Run cloud-init cloud-config yaml file

How do I run, for development purposes, cloud-init yaml file that will be normally run via user-data?
I know how I can re-run cloud-init, but I want to develop complicated cloud-init file and to do that it is rather difficult to continually build new instances.
Sorry to say, you're going to have to run it on a new clean instance (or at least a snapshot of one). Even if you did manually go back and start at different steps, there are potentially side effects.
I think you'll find that if you get used to managing local VMs, you can debug your scripts fairly quickly.
The quickest path for iterating on user-data input to cloud-init is probably via lxd. You can quickly set up lxd on a vm host or a bare metal system. Once set up, launches are very quick.
$ cat ud.yaml
- "read up idle < /proc/uptime; echo Up $up seconds | tee /run/runcmd.log"
$ lxc launch ubuntu-daily:bionic ud-test "--config=user.user-data=$(cat ud.yaml)"
Creating ud-test
Starting ud-test
$ lxc exec ud-test cat /run/runcmd.log
Up 8.05 seconds
$ lxc stop ud-test
$ lxc delete ud-test
You might be able to get away with just running cloud-init clean and then re-running it.
I'm experimenting with cloud-init and using an Ubuntu box with KVM as a virtualization lab. I made a simple Makefile to build the cloud-init image and launch it in an KVM instance.
You can see my code here:
all: clean build run
CLOUD_IMAGE_FILE = "bionic-server-cloudimg-amd64.img"
#echo "Removing build artifacts"
-#rm -f config.img 2>/dev/null
-#virsh destroy $(INSTANCE_NAME) 2>/dev/null || true
-#virsh undefine $(INSTANCE_NAME) 2>/dev/null || true
-#rm -f $(INSTANCE_NAME).img
#echo "Building cloud config drive"
cloud-localds config.img config.yaml
#echo "Spawning instance $(INSTANCE_NAME)"
virt-install \
--name $(INSTANCE_NAME) \
--memory 8192 \
--disk ./$(INSTANCE_NAME).img,device=disk,bus=virtio \
--disk ./config.img,device=cdrom \
--os-type linux \
--os-variant ubuntu18.04 \
--virt-type kvm \
--graphics none \
--network bridge=br0
I am not sure why this answer is not here.. maybe it is not applicable for earlier versions.
All I do to re-run cloud-init for dev testing is (, especially when testing user-data changes):
1 - change the config file/files, usually only /etc/cloud/cloud.cfg
2 - run clean:
cloud-init clean -l
-l cleans the logs also
3 - re-run cloud-init
cloud-init init
of course, this has its limitations, depending on the settings you test, cloud-init clean is not going to revert the previous changes, but maybe you'll be able to figure out ways to test. For example I am testing the creation of new users, and every time I change something in the settings for a user and I want to test it.. I create a new user.
Yes, all this is quick in-development test, if you need to truly verify your changes - you need new instance.
Re-running all of cloud-init without system reboo isn't a recommended approach because some parts of cloud-init are run at systemd generator timeframe to detect new datasource types. That said, the following commands will allow you to accomplish this without reboot on a system.
cloud-init supports a clean subcommand to remove all semaphore files and allow cloud-init to re-run all config modules again. Beware that this will mean SSH host-keys are regenerated and .ssh config files re-written so it could impact your ability to get back into the VM.
To clean all semaphores so cloud-init modules will all re-run on next boot:
sudo cloud-init clean --logs
cloud-init typically runs multiple boot stages in sequence due to systemd service dependencies. If you want to repeat that process without a reboot you can run the following 4 commands:
Detect local datasource (cloud platform) and obtain user-data:
sudo cloud-init init --local
Detect any datasources and user-data which require network up and run cloud_init_modules defined in /etc/cloud/cloud.cfg:
sudo cloud-init init
Run all cloud_config_modules defined in /etc/cloud/cloud.cfg:
sudo cloud-init modules --mode=config
Run all cloud_final_modules defined in /etc/cloud/cloud.cfg:
sudo cloud-init modules --mode=final

WEKA Command Line Parameters

I am able to run Weka form CLI using below command:
java -cp weka.jar weka.classifiers.functions.MultilayerPerceptron -t Dataset.arff
Weka Explorer Target Selection Parameters
How can I set the Target Parameters for example "Number of time units for forecast" using command Line?
We are trying to use command line to improve memory utilization , we have a large dataset with 10000 attributes which is causing Java Heap Space everytime we run it from GUI.
Thanks For the response.
Posting answer to my own question:
java -cp weka.jar weka.Run weka.classifiers.timeseries.WekaForecaster -W "weka.classifiers.functions.MultilayerPerceptron -L 0.01 -M 0.2 -N 5000 -V 0 -S 0 -E 20 -H 20 " -t <dataset file> -F <FieldList> -L 1 -M 3 -prime 3 -horizon 6
We can always get more help using :
java -cp weka.jar weka.Run -h