Slurm: Why use srun inside sbatch? - hpc

In a sbatch script, you can directly launch programs or scripts (for example an executable file myapp) but in many tutorials people use srun myapp instead.
Despite reading some documentation on the topic, I do not understand the difference and when to use each of those syntaxes.
I hope this question is precise enough (1st question on SO), thanks in advance for your answers.

The srun command is used to create job 'steps'.
First, it will bring better reporting of the resource usage ; the sstat command will provide real-time resource usage for processes that are started with srun, and each step (each call to srun) will be reported individually in the accounting.
Second, it can be used to setup many instances of a serial program (program that only use one CPU) into a single job, and micro-schedule those programs inside the job allocation.
Finally, for parallel jobs, srun will also play the important role of starting the parallel program and setup the parallel environment. It will start as many instances of the program as were requested with the --ntasks option on the CPUs that were allocated for the job. In the case of a MPI program, it will also handle the communication between the MPI library and Slurm.

Related

Does assigning more nodes to a job on a SLURM server increase available RAM?

I am working with a program that needs a lot RAM. Currently I am running it on a SLURM cluster. Each node has 125GB RAM. When submitting the job to a single node it eventually fails as it runs out of memory. My rather naive question, as I am new to working on servers, is:
Does assigning more nodes with the command --nodes flag increase available RAM for the submitted job?
For example:
When assigning 10 nodes instead of 1, with the command below, the program fails at the same point as with with one node.
#SBATCH --nodes=10
Is there some other way to combine RAM from multiple nodes for a single job?
Any and all advice is welcome!
That depends on your program, but most likely no.
To use multiple nodes on a Slurm Cluster (or any cluster, for that matter), your program needs to be set up in very specific way, ie. you need inter node communictaion. This is usually done via MPI and the whole program has to be designed around it.
So if your program uses MPI it may be able to split the workload over several nodes. And even that does not guarantee lower memory as that is usually not the goal of such a parallelization.

Matlab/Simulink: run batch of simulations in parallel?

I have to run a series of simulations and save the results. Since by default Matlab only uses one core, I wonder if it is possible to open multiple worker tasks and assign different simulation runs to them?
You could run each simulation in a separate MATLAB instance and let the OS handle the process to core assignment.
One master MATLAB could synchronize each child instances checking for example if simulation results file are existing.
I aso have the same problem but I did not manage to really understand how to make it in MatLab. The documentation in matlab is too advanced to get to know how to make it.
Since I am working with Ubuntu I find a way to do the work calling the unix command from MatLab and using the parallel GNU command
So I mange to run my simulation in parallel with 4 cores.
unix('parallel --progress -j4 flow > /dev/null :::: Pool.txt','-echo')
you can find more info in the link
Shell, run four processes parallel
Details of the syntaxis can be found at https://www.gnu.org/software/parallel/
but breifly I can tell you
--progress shows a status of the progress
-j4 tells the amount or jobs in parallel you want to have
flow is the name of my simulator
/dev/null was just to avoid the screen run output of the simulator to show up
Pool.txt is a file I made with the required simulator input that is basically the path and the main simulator file.
echo I do not remember now what was it for :D

Run multiple instance of a process on a number of servers

I would like to run multiple instances of a randomized algorithm. For performance reason, I'd like to distribute the tasks on several machines.
Typically, I run my program as follows:
./main < input.txt > output.txt
and it takes about 30 minutes to return a solution.
I would like to run as many instances of this as possible, and ideally not change the code of the program. My questions are:
1 - What online services offer computing resources that would suit my need?
2 - Practically, how should I launch remotely all the processes, get notified of the termination, and then aggregate the results (basically, pick up the best solution). Is there a simple framework that I could use or should I look into ssh-based scripting?
1 - What online services offer computing resources that would suit my need?
Amazon EC2.
2 - Practically, how should I launch remotely all the processes, get notified of the termination, and then aggregate the results (basically, pick up the best solution). Is there a simple framework that I could use or should I look into ssh-based scripting?
Amazon EC2 has an API for launching virtual machines. Once they're launched, you can indeed use ssh to control jobs, and I would recommend this solution. I would expect that other softwares for distributed job management exist, but they aren't likely to be any simpler to configure than ssh.

Matlab process termination in slurm

I have two questions that to me seem related:
First, is it necessary to explicitly terminate Matlab in my sbatch command? I have looked through several online slurm tutorials, and in some cases the authors include an exit command:
http://www.umbc.edu/hpcf/resources-tara-2013/how-to-run-matlab.html
And in some they don't:
http://www.buffalo.edu/ccr/support/software-resources/compilers-programming-languages/matlab/PCT.html
Second, when creating a parallel pool in a job, I almost always get the following warning:
Warning: Found 4 pre-existing communicating job(s) created by pool that are
running, and 2 communicating job(s) that are pending or queued. You can use
'delete(myCluster.Jobs)' to remove all jobs created with profile local. To
create 'myCluster' use 'myCluster = parcluster('local')'
Why is this happening, and is there any way to avoid it happening to myself and to others because of me?
It depends on how you launch Matlab. Note that your two examples use distinct methods for running a matlab script; the first one uses the -r option
matlab -nodisplay -r "matrixmultiply, exit"
while the second one uses stdin redirection from a file
matlab < runjob.m
In the first solution, the Matlab process will be left running after the script is finished, that is why the exit command is needed there. In the second solution, the Matlab process is terminated as stdin closes when the end of the file is reached.
If you do not end the matlab process, Slurm will kill it when the maximum allocation time is reached, as defined by the --time option in you submission script or by the default cluster (or partition) value.
To avoid the warning you mention, make sure to systematically use matlabpool close at the end of your job. If you have several instances of Matlab running on the same node, and you have a shared home directory, you will probably get the warning anyhow, as I believe the information about open matlab pools is stored in a hidden folder in your home. Rebooting will probably not help, but finding those files and removing them will (be careful though and ask the system administrator).
to avoid your warning, you have to delete
.matlab/local_cluster_jobs/
directory

Queuing systems - what is a good way to start up multiple workers?

How have you set-up one or more worker scripts for queue-oriented systems?
How do you arrange to startup - and restart if necessary - worker scripts as required? (I'm thinking about such tools as init.d/, Ruby-based 'god', DJB's Daemontools, etc, etc)
I'm developing an asynchronous queue/worker system, in this case using PHP & BeanstalkdD (though the actual language and daemon isn't important). The tasks themselves are not too hard - encoding an array with the commands and parameters into JSON for transport through the Beanstalkd daemon, picking them up in a worker script to action them as required.
There are a number of other similar queue/worker setups out there, such as Starling, Gearman, Amazon's SQS and other more 'enterprise' oriented systems like IBM's MQ and RabbitMQ. If you run something like Gearman, or SQS - how do you start and control the worker pool? The questions is on the initial worker startup, and then being able to add additional extra workers, shutting them down at will (though I can send a message through the queue to shut them down - as long as some 'watcher' won't automatically restart them). This is not a PHP problem, it's about straight Unix processes of setting up one or more processes to run on startup, or adding more workers to the pool.
A bash script to loop a script is already in place - this calls the PHP script which then collects and runs tasks from the queue, occasionally exiting to be able to clean itself up (it can also pause a few seconds on failure, or via a planned event). This works fine, and building the worker processes on top of that won't be very hard at all.
Getting a good worker controller system is about flexibility, starting one or two automatically on a machine start, and being able to add a couple more from the command line when the queue is busy, shutting down the extras when no longer required.
I've been helping a friend who's working on a project that involves a Gearman-based queue that will dispatch various asynchronous jobs to various PHP and C daemons on a pool of several servers.
The workers have been designed to behave just like classic unix/linux daemons, thanks to simple shell scripts in /etc/init.d/, and commands like :
invoke-rc.d myWorker start|stop|restart|reload
This mechanism is simple and efficient. And as it relies on standard linux features, even people with a limited knowledge of your app can launch a daemon or stop one, if they know how it's called system-wise (aka "myWorker" in the above example).
Another advantage of this mechanism is it makes your workers pool management easy as well. You could have 10 daemons on your machine (myWorker1, myWorker2, ...) and have a "worker manager" start or stop them depending on the queue length. And as these commands can be run through ssh, you can easily manage several servers.
This solution may sound cheap, but if you build it with well-coded daemons and reliable management scripts, I don't see why it would be less efficient than big-bucks solutions, for any average (as in "non critical") project.
Real message queuing middleware like WebSphere MQ or MSMQ offer "triggers" where a service that is part of the MQM will start a worker when new messages are placed into a queue.
AFAIK, no "web service" queuing system can do that, by the nature of the beast. However I have only looked hard at SQS. There you have to poll the queue, and in Amazon's case overly eager polling is going to cost you some real $$.
I've recently been working on such a tool. It's not entirely finished (thought it should take more than a few more days before I hit something I could call 1.0) and clearly not ready for production yet, but the important part are already coded. Anybody can have a look at the code here: https://gitorious.org/workers_pool.
Supervisor is a good monitor tool. It includes a web UI where you can monitor and manage workers.
Here is a simple config file for a worker.
[program:demo]
command=php worker.php ; php command to run worker file
numprocs=2 ; number of processes
process_name=%(program_name)s_%(process_num)03d ; unique name for each process if numprocs > 1
directory=/var/www/demo/ ; directory containing worker file
stdout_logfile=/var/www/demo/worker.log ; log file location
autostart=true ; auto start program when supervisor starts
autorestart=true ; auto restart program if it exits