I am running a greedy feature selection algorithm, and I am attempting to use job arrays to explore parallelization.
The idea is that we have three steps that depend on the previous step:
Step 1: Setup for iteration i
Step 2: Fit models at iteration i
Step 3: Find best model at iteration i
Because you need all the models (>10) to have finished training before starting step 3, plain old job chaining is not optimal.
So I am trying to use job arrays, which do exactly what I want: only when all my models are fitted do I move to step 3.
However, I am having trouble setting up the dependency.
I was told that the dependency for a whole job array needs to be the job ID (which is a number) and not the job name (e.g. runSetup$n_subject$i).
So: how do I get the job ID from the whole job array ?
Or better yet: how to best set a dependency for a whole job array ?
This answer is very interesting, but doesn't tell me how to best set a dependency when my job array contains 10 or more jobs.
#!/bin/bash
# Subject to consider
n_subject=$1 # takes in input arguments from command line.
cohort=$2
priors_and_init=$3
nparam=16
for ((i = 1; i <= $nparam; i++)); do
# Run setup
if [[ $i -eq 1 ]]; then
bsub -J "runSetup$n_subject$i" matlab -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
else
last_iter=$((i-1))
bsub -J "runSetup$n_subject$i" -w "done(saveBest$n_subject$last_iter)" matlab -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
fi
# Fit models
max_sim=$((nparam-i+1))
bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)"
# Extracting the job ID from the fitDCMs jobs
# Then: For all trained DCMs, get the best model and save it
JOBID=$(get_jobid bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)" 2> /dev/null)
if [ -n "$jobid" ]; then
bsub -J "saveBest$n_subject$i" -w "numdone($JOBID,*)" matlab -singleCompThread -nodisplay -r "save_best_model($n_subject,$cohort, $priors_and_init, $i)"
fi
done
The output I am getting:
MATLAB job.
Job <94564566> is submitted to queue <normal.24h>.
MATLAB job.
Job <94564567> is submitted to queue <normal.24h>.
MATLAB job.
saveBest121: No matching job found. Job not submitted.
MATLAB job.
runSetup122: No matching job found. Job not submitted.
[…]
After searching a bit, I found a way to get the job ID.
JOBID=$(bsub command1 | awk '/is submitted/{print substr($2, 2, length($2)-2);}')
if [ -n "$JOBID" ]; then
bsub -w "numdone($JOBID,*)" command2
fi
The first line submits the job and extracts its job ID.
This answer was found here.
I was told that the dependency for a whole job array needs to be the job ID (which is a number) and not the job name
That should work. For example:
bsub -J "iterate[1-10]" ...
bsub -J "finalize" -w "done(iterate)" ...
Job finalize won't start until all elements of iterate are done.
Related
I would like to pass LSB_JOBINDEX to as an argument to my script instead of using an environment variable.
This makes my script more LSF agnostic and avoids creating a helper script that uses the environment variable.
However, I was not able to use LSB_JOBINDEX in arguments: it only works as part of the initial command string.
For example, from a bash shell, I use the test command:
bsub -J 'myjobname[1-4]' -o bsub%I.log \
'echo $LSB_JOBINDEX' \
'$LSB_JOBINDEX' \
\$LSB_JOBINDEX \
'$LSB_JOBINDEX' \
"\$LSB_JOBINDEX"
and the output of say bsub2.log is:
2 $LSB_JOBINDEX $LSB_JOBINDEX $LSB_JOBINDEX $LSB_JOBINDEX
So in this case, only the first $LSB_JOBINDEX got expanded, but not any of the following ones.
But I would rather not pass the entire command as a single huge string as the 'echo $LSB_JOBINDEX' in this example. I would prefer to just use separate arguments as in a regular bash command.
I've also tried to play around with %I but it only works for -o and related bsub options, not for the command itself.
Related: Referencing job index in LSF job array
Tested in LSF 10.1.0. Related documentation: https://www.ibm.com/support/knowledgecenter/en/SSWRJV_10.1.0/lsf_admin/job_array_cl_args.html
bsub will add single quotes around the arguments if the argument starts with $. For example. If the bsub command line is
bsub command -a $ARG1 -b $ARG2
Then bsub will add quotes to the arguments to the 2nd and 4th parameters. The command is stored like this
command -a '$ARG1' -b '$ARG2'
One way to prevent this is to put the commands in a script. Like this:
$ cat cmd
echo $LSB_JOBINDEX
echo "line 2"
echo $LSB_JOBINDEX
Then run your job like this:
$ bsub -I < cmd
Job <2669> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
0
line 2
0
Note that the -I is not needed. Its just so you can see the job output on the bsub's stdout.
EDIT
OK. Looks like this works. But its not really a serious answer since it's so ugly. The thing is that bsub will surround the argument with single quotes if the argument starts with $. So the strategy is to find some way to make sure that the first character in the argument isn't a $. One way is to put any character other than $ as the first character of the argument. Follow it by a backspace literal, followed by the $. Note that it needs to be the actual backspace character, not ^ followed by H. Use ctrl-v followed by a ctrl-h to get the literal appended to the command line.
$ bsub -I echo "x^H\$LSB_JOBINDEX" "x^H\$LSB_JOBINDEX"
Job <2686> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
0 0
EDIT2
A tab literal also works. Not that its much better.
$ bsub -I echo " \$LSB_JOBINDEX" " \$LSB_JOBINDEX"
Job <2687> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on hostA>>
0 0
I want to create a job array in slurm in a way such that it is called a Matlab function that depends on the array task id. I tried
#!/bin/bash
#SBATCH -J TEST
#SBATCH -p slims
#SBATCH -o o
#SBATCH -e e
matlab -r "test(${SLURM_ARRAY_TASK_ID})"
where test.m is the matlab function that I want to run. This throw the error "Not enough arguments in line 7 test.m ..."
How should I do it?
It looks like $SLURM_ARRAY_TASK_ID was not defined, and there is no --array parameter in your submission file. So unless you provided that argument on the command line
sbatch --array ... <yourscript.sh>
you did not tell Slurm to create an array.
Either add #SBATCH --array ... to your submission script or specify it on the command line.
I'm trying to submit commands to the LSF scheduler with bsub but this command includes a parameter value that must be quoted and contains a semicolon.
Here is a simple command to illustrate my problem
bsub -o t.o -e t.e echo "foo;bar"
it fails with "line 8: bar: command not found", so I thought I could escape the semicolon but this
bsub -o t.o -e t.e echo "foo\;bar"
causes the same error, so does this
bsub -o t.o -e t.e echo 'foo;bar'
I know I can get around it by writing the command to a script file and executing that as the bsub command but in this case I am going to test a number of parameters and it would be so much handier to just modify the bsub command rather than editing a shell script each time.
Thanks for your help!
One simple way I can think of to do this is to use bsub's subshell interface: simply execute bsub <options> from your command line without specifying a command. bsub will then prompt you for a command in a subshell, and you can use quotes in this subshell.
Send the subshell an end-of-file (CTRL+D) to let it know you're done. Here's an example run using something similar to your case but running interactively instead of using -o to capture the output:
% bsub -I
bsub> echo "foo;bar"
bsub> <================[### Hit CTRL+D here ###]
Job <5841> is submitted to default queue <normal>.
<<Waiting for dispatch ...>>
<<Starting on hb05b10>>
foo;bar
%
I'm looking for a way to log information to a file about a submitted job immediately after it starts.
Normally all the job status is appended to the log file after a job has completed, but I'd like to know the information it has when it starts.
I know there's the -B flag but I want it in a file, and I could also do something like:
bsub -J jobby -o run_job.log bjobs -l -J jobby > jobby.log; run_job
but maybe someone knows of a funkier way of doing this.
There are some subtle variations that essentially accomplish the same thing:
You can use a pre-exec to do a similar thing instead of doing the
bjobs as part of the command:
bsub -J jobby -E "bjobs -l -J jobby > jobby.log" run_job
You can use the job's environment to get your own jobid instead of
using -J if you write your submission as a script:
#!/bin/sh
#BSUB -o run_job.log
bjobs -l $LSB_JOBID > $LSB_JOBID.log
run_job
Then submit your job like this:
bsub < jobscript.sh
You can do some combination of the above: use $LSB_JOBID in a
pre-execution script.
That's about as 'funky' as it gets AFAIK :)
Say I submit a job using something like bsub pwd. Now I would like to get the job ID of that job in order to build a dependency for the next job. Is there some way I can get bsub to return the job ID?
Nils and Andrey have the answers to this specific question in shell and C/C++ environments respectively. For the purposes of building dependencies, you can also name your job with -J then build the dependency based on the job name:
bsub -J "job1" <cmd1>
bsub -J "job2" <cmd2>
bsub -w "done(job1) && done(job2)" <cmd>
There's a bit more info here.
This also works with job arrays:
bsub -J "ArrayA[1-10]" <cmd1>
bsub -J "ArrayB[1-10]" <cmd2>
bsub -w "done(ArrayA[3]) && done(ArrayB[5])" <cmd>
You can even do element-by-element dependency. The following job's i-th element will only run when the corresponding element in ArrayB reaches DONE status:
bsub -w "done(ArrayB[*])" -J "ArrayC[1-10]" <cmd3>
You can find more info on the various things you can specify in -w here.
Just as a reference, this is the best solution I could come up with so far. It takes advantage of the fact that bsub write a line containing the ID to STDOUT.
function nk_jobid {
output=$($*)
echo $output | head -n1 | cut -d'<' -f2 | cut -d'>' -f1
}
Usage:
jobid=$(nk_jobid bsub pwd)
In case you are using C++, you can use the lsblib, LSF C API to submit jobs. The input and the output are structs. In particular, the output struct contains the job id.
#include <lsf/lsbatch.h>
LS_LONG_INT lsb_submit (struct submit *jobSubReq, struct submitReply *jobSubReply)
$jobid = "0"
bsub pwd > $jobid
cat $jobid
If you just want to view the JOBID after submission, most of the time I will just use bhist or bhist -l to view the running jobs and details.
$ bhist
Summary of time in seconds spent in various states:
JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
8664 F14r3 sample 2 0 187954 0 0 0 187956