accessing command line arguments for headless NetLogo in the Matlab extension - matlab

I'm running the matlab extension for netlogo in headless(non-gui) mode. I've downloded the extension source and am trying to access the command line arguments from the java code in the extension. The command line arguments are stored in LabInterface.Settings. I would like to be able to access that object in the java code of the extension. I've been working on this for a couple of days but have had not success. It seems the extension process is designed to create primitives to be used inside netlogo. These primitives have knowledge of the different netlogo objects but there is no way for the extension java code to access it. I would appreciate any help.
I would like to be able to run multiple netlogo-matlab analyses with varying parameters, in batch mode accross multiple machines, perhaps a flux cluster. I need to run in headless because of the batch nature. Sometimes the runs will be on the same machine, sometimes split accross multiple machines, flux or condor. I know a similar functionality exist in netlogo for running varying parameters in a single session. Is there some way to split these accross multiple machines?
Currently, I create a series of setup files for netlogo. Each setup file represents the paramenters that vary for that run. Then I submit each netlogo - setup file combination as a single run. Each run can be farmed out to a seperate machine or processor. Adding the matlab extension complicates this. The matlab extension connects it's server to port 9999. With multiple servers running they all get attached to port 9999 and this causes problems. I was hoping to get information from the setup-file name to create independent port numbers tied to the setup file names. This way I could create a unique socket for each setup file, and hence a unique server connection for each netlogo run.

NetLogo doesn't provide a facility for distributing model runs on a cluster, but various people have done it anyway. See:
http://ccl.northwestern.edu/netlogo/docs/faq.html#cluster
https://github.com/jurnix/netlogo-cluster
http://mass.aitia.ai/index.php/intro/meme
and past threads about it on the netlogo-users group. There is no single standard solution.
As for getting access to LabInterface.Settings, it appears to me from looking through the NetLogo source code that the settings object isn't actually stored anywhere. It's just handed off from method to method, ultimately to lab.Lab.run, without ever actually being kept. So trying to get access to the name of the setup file won't work.
So you'll need to some other way for to make the extension generate unique port numbers. Seems to me like there's any number of possible solutions to this. At the time you generate the setup file you know its name, so you could generate a port number at the same time and include it in the experiment definition contained in the file. Or you could pass a port number in a Java system property (using -D) at the time you start NetLogo. Or you could generate a port number based on the process id of the JVM process. Or you could have the extension try port 9999 and see if it's already in use, and if it is, then try a different port number. That's just a few ideas... I could probably come up with ten more.

Related

Sequence Tagging in batch with Mallet cmd prompt

I have tested the SimpleTagger for Sequence Tagging on mallet's cmd prompt interface. I would now like to train over many files and run tests in batches. Is it also possible to do this on mallet's command prompt? I want to get some hint on the performance of the algorithm for the task at hand before I dive into using the JAVA API.
I have seen that Classification tasks can be run in batch from the command prompt.
is it possible to use SimpleTagger in batch? if no
Can someone point me to a reference code where Sequence Tagging has been done in batch using the java API.
Somewhere I found a reference to "http://mallet.cs.umass.edu/index.php/Command_line_tutorial", but the link seems to be broken.
After some exploration, I learned that it was not possible to readily use the cc.mallet.fst.SimpleTagger for batch evaluations. Instead, I found out that the cc.mallet.examples.TrainCRF is a handy code (that uses the SimpleTagger). The code takes a train and test datasets (in Mallet sequence tagging format, instances separated by single-line) as input arguments and that's it.
I used the mallet-2.0.8 installation available on the Mallet page.
Beware to NOT tune the models based on the performance on the test set. You should avoid that and perhaps not verify the performance on test set until you have tuned the model on the training set sufficiently.

Running Netlogo headless on the cloud

I've written a NetLogo model to model agent movement in a landscape. I'd like to run this model from the command prompt, using AWs/Google Compute. The model uses about 500MB worth of input rasters and shapefiles and writes rasters and csv files. It also uses the extensions gis, rnd, cf, table and csv.
Would this be possible using the Controlling API? (https://github.com/NetLogo/NetLogo/wiki/Controlling-API). Can I just use the steps listed in the link? I have not tried running NetLogo from the command prompt before.
Also, I do not want to run BehaviourSpace as it is not relevant to this model.
A BehaviorSpace experiment can consist of only a single run, so BehaviorSpace may actually be relevant to you here. You only need to write one short XML file (or no new files at all, if the experiment setup you want is already part of the model) to do it this way.
Whereas if you go through the controlling API, you will have to write and compile Java (or Scala) code, which is a substantially more complex task.
But if you decide to go the controlling API route: yes, that works too, and it is documented, as you've already noticed.

managing instances of powerCLI script

I wrote a powerCLI script that can automatically deploy a new VM with some given parameters.
In few words, the script connects to a given VC and start the deployment from an existing template.
Can I regulate the number of instances of my script that will run on the same computer ?
Can I regulate the number of instances of my script that will run on different computers but when both instances will be connected to the same VC ?
To resolve the issue i thought of developing a server side appilcation where each instance of my script will connect to, and the server will then handle all the instances , but i am not sure if such thing is possible in powerCLI/Powershell.
Virtually anything is poshable, or so they say. What you're describing may be overkill, however, depending on your scenario. Multiple instances of the same script will each run in its own Powershell process. Virtual Center allows hundreds of simultaneous connections. Of course the content or context of your script might dictate that it shouldn't run in simultaneous instances. I haven't experimented, but it seems like there are ways to determine the name of running Powershell scripts. So if you keep the script name consistent on each computer, you could probably build in some checks along the lines of the linked answer.
But depending on your particulars, it might be easier to go a different way. For example, if you don't want the script to run simultaneously because you have hard-coded the name of a new-osCustomizationSpec, for example, a simple\klugey solution might be to do a check for that new spec, and disconnect/exit/rollback if it exists. A better solution might be to give the new spec a unique name. But the devil is in the details. Hope that helps a bit.

Explain the steps for db2-cobol's execution process if both are db2 -cobol programs

How to run two sub programs from a main program if both are db2-cobol programs?
My main program named 'Mainpgm1', which is calling my subprograms named 'subpgm1' and 'subpgm2' which are a called programs and I preferred static call only.
Actually, I am now using a statement called package instead of a plan and one member, both in 'db2bind'(bind program) along with one dbrmlib which is having a dsn name.
The main problem is that What are the changes affected in 'db2bind' while I am binding both the db2-cobol programs.
Similarly, in the 'DB2RUN'(run program) too.
Each program (or subprogram) that contains SQL needs to be pre-processed to create a DBRM. The DBRM is then bound into
a PLAN that is accessed by a LOAD module at run time to obtain the correct DB/2 access
paths for the SQL statements it contains.
You have gone from having all of your SQL in one program to several sub-programs. The basic process
remains the same - you need a PLAN to run the program.
DBA's often suggest that if you have several sub-programs containing SQL that you
create PACKAGES for them and then bind the PACKAGES into a PLAN. What was once a one
step process is now two:
Bind DBRM into a PACKAGE
Bind PACKAGES into a PLAN
What is the big deal with PACKAGES?
Suppose you have 50 sub-programs containing SQL. If you create
a DBRM for each of them and then bind all 50 into a PLAN, as a single operation, it is going
to take a lot of resources to build the PLAN because every SQL statement in every program needs
to be analyzed and access paths created for them. This isn't so bad when all 50 sub-programs are new
or have been changed. However, if you have a relatively stable system and want to change 1 sub-program you
end up reBINDing all 50 DBRMS to create the PLAN - even though 49 of the 50 have not changed and
will end up using exactly the same access paths. This isn't a very good apporach. It is analagous to compiling
all 50 sub-programs every time you make a change to any one of them.
However, if you create a PACKAGE for each sub-program, the PACKAGE is what takes the real work to build.
It contains all the access paths for its associated DBRM. Now if you change just 1 sub-program you
only have to rebuild its PACKAGE by rebinding a single DBRM into the PACKAGE collection and then reBIND the PLAN.
However, binding a set of PACKAGES (collection) into a PLAN
is a whole lot less resource intensive than binding all the DBRM's in the system.
Once you have a PLAN containing all of the access paths used in your program, just use it. It doesn't matter
if the SQL being executed is from subprogram1 or subprogram2. As long as you have associated the PLAN
to the LOAD that is being run it should all work out.
Every installation has its own naming conventions and standards for setting up PACKAGES, COLLECTIONS and
PLANS. You should review these with your Data Base Administrator before going much further.
Here is some background information concerning program preperation in a DB/2 environment:
Developing your Application

matlab distributed computing with sge(qsub)

Recently I got access to run my codes on a cluster. My code is totally paralleizable but I don't know how to best use its parallel nature. I've to compute elements of a big matrix and each of them are independent of the others. I want to submit the job to run on several machine (like 100) to speed up the computation of the matrix.
Right now, I wrote a script to submit multiple jobs each responsible to compute a part of the matrix and save it in a .mat file. At the end I'm merging them to get the whole matrix. For submitting each individual job, I've created a new .m file (run1.m, run.2, ...) to set a variable and then run the function to compute the associated part in the matrix. So basically run1.m is
id=1;compute_dists_matrix
and then compute_dists_matrix uses id to find the part it is going to compute. Then I wrote a script to create run1.m through run60.m and the qsub them to the cluster.
I wonder if there is a better way to do this using some MATLAB features for example. Because this seems to be a very typical task.
Yes, it works, but is not ideal, and as you say is a common problem. Matlab has a parallel programming toolkit.
Does your cluster have this? If so, the distributed arrays is worth having a look at. If they don't have access to it, then what you are doing is the only other way. You can wrap your run1.m,run2.m in a controlling script to automate it for you...
I believe you could use command line arguments for the id and submit jobs with a range of values for this id. Command line arguments can be processed by launching MATLAB from the command line without the IDE and providing the name of the script to be executed and the list of arguments. I would think you can set up dependencies in your job manager and create a "reduce" script to merge the partial results (from files). The whole process could be managed from a single script that would generate the id & other necessary arguments and submit the processing & postprocessing jobs with dependencies.