for the past 2 months, I have been trying to find out why why I cannot submit a job on our HPC (using QSUB), recently, I found out that my home directory was
$/export/home/wrfuser
while my other co-workers are
$/home/wrfuser1
*note /export
I can submit a job but it never shows a result. Here's my sample hello.qsub:
#!/bin/bash --login
#PBS -j oe
#PBS -l walltime=00:01:00,nodes=1,ppn=1,mem=50mb
export WORKDIR=/mnt/NFS003/WRF/WRF_hist/qsub_test
cd ${WORKDIR}
echo "HELLO WORLD"
[wrfuser#HPC qsub_test]$ vi hello.qsub
[wrfuser#HPC qsub_test]$ qsub hello.qsub
Your job 7618 ("hello.qsub") has been submitted
[wrfuser#HPC qsub_test]$ qstat
job-ID prior name user state submit/start at queue slots ja-task-ID
7617 0.55500 hello.qsub wrfuser Eqw 04/06/2018 10:21:35 1
7618 0.55500 hello.qsub wrfuser Eqw 04/06/2018 10:35:15 1
[wrfuser#HPC qsub_test]$
If its not possible to do that on /export/home, is there any other way to submit a job on HPC?
I solved it!!! I changed my qsub script to
#!/bin/bash
#
#$ -cwd
#$ -j y
#$ -S /bin/bash
#$ -pe orte 64
echo "HELLO JOHN"
mkdir Hello_world
[wrfuser#CADHPC01 run]$
I am using number of nodes,ppn, and memory on my previous script and now I changed it to number of cores #$ -pe orte 64. However, I not 100% sure that it is the main reason for that error.
I am newbie here in stackoverflow and it feels like I will learn and enjoy exponentially here!! Thanks! :D
Related
I am running a greedy feature selection algorithm, and I am attempting to use job arrays to explore parallelization.
The idea is that we have three steps that depend on the previous step:
Step 1: Setup for iteration i
Step 2: Fit models at iteration i
Step 3: Find best model at iteration i
Because you need all the models (>10) to have finished training before starting step 3, plain old job chaining is not optimal.
So I am trying to use job arrays, which do exactly what I want: only when all my models are fitted do I move to step 3.
However, I am having trouble setting up the dependency.
I was told that the dependency for a whole job array needs to be the job ID (which is a number) and not the job name (e.g. runSetup$n_subject$i).
So: how do I get the job ID from the whole job array ?
Or better yet: how to best set a dependency for a whole job array ?
This answer is very interesting, but doesn't tell me how to best set a dependency when my job array contains 10 or more jobs.
#!/bin/bash
# Subject to consider
n_subject=$1 # takes in input arguments from command line.
cohort=$2
priors_and_init=$3
nparam=16
for ((i = 1; i <= $nparam; i++)); do
# Run setup
if [[ $i -eq 1 ]]; then
bsub -J "runSetup$n_subject$i" matlab -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
else
last_iter=$((i-1))
bsub -J "runSetup$n_subject$i" -w "done(saveBest$n_subject$last_iter)" matlab -singleCompThread -nodisplay -r "setup_greedy_forward($n_subject,$cohort, $priors_and_init, $i)"
fi
# Fit models
max_sim=$((nparam-i+1))
bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)"
# Extracting the job ID from the fitDCMs jobs
# Then: For all trained DCMs, get the best model and save it
JOBID=$(get_jobid bsub -W 08:00 -J "fitDCMs$n_subject[1-$max_sim]" -w "done(runSetup$n_subject$i)" -R "rusage[mem=16000]" matlab -singleCompThread -nodisplay -r "fit_dcm_greedy_forward($n_subject,$cohort, $priors_and_init, \$LSB_JOBINDEX)" 2> /dev/null)
if [ -n "$jobid" ]; then
bsub -J "saveBest$n_subject$i" -w "numdone($JOBID,*)" matlab -singleCompThread -nodisplay -r "save_best_model($n_subject,$cohort, $priors_and_init, $i)"
fi
done
The output I am getting:
MATLAB job.
Job <94564566> is submitted to queue <normal.24h>.
MATLAB job.
Job <94564567> is submitted to queue <normal.24h>.
MATLAB job.
saveBest121: No matching job found. Job not submitted.
MATLAB job.
runSetup122: No matching job found. Job not submitted.
[…]
After searching a bit, I found a way to get the job ID.
JOBID=$(bsub command1 | awk '/is submitted/{print substr($2, 2, length($2)-2);}')
if [ -n "$JOBID" ]; then
bsub -w "numdone($JOBID,*)" command2
fi
The first line submits the job and extracts its job ID.
This answer was found here.
I was told that the dependency for a whole job array needs to be the job ID (which is a number) and not the job name
That should work. For example:
bsub -J "iterate[1-10]" ...
bsub -J "finalize" -w "done(iterate)" ...
Job finalize won't start until all elements of iterate are done.
I'm looking for a way to log information to a file about a submitted job immediately after it starts.
Normally all the job status is appended to the log file after a job has completed, but I'd like to know the information it has when it starts.
I know there's the -B flag but I want it in a file, and I could also do something like:
bsub -J jobby -o run_job.log bjobs -l -J jobby > jobby.log; run_job
but maybe someone knows of a funkier way of doing this.
There are some subtle variations that essentially accomplish the same thing:
You can use a pre-exec to do a similar thing instead of doing the
bjobs as part of the command:
bsub -J jobby -E "bjobs -l -J jobby > jobby.log" run_job
You can use the job's environment to get your own jobid instead of
using -J if you write your submission as a script:
#!/bin/sh
#BSUB -o run_job.log
bjobs -l $LSB_JOBID > $LSB_JOBID.log
run_job
Then submit your job like this:
bsub < jobscript.sh
You can do some combination of the above: use $LSB_JOBID in a
pre-execution script.
That's about as 'funky' as it gets AFAIK :)
I made a test script test.qsub:
#!/bin/bash
#PBS -q batch
#PBS -o output.txt
#PBS -e Error.err
echo "hello world"
When running qsub test.qsub it does not generate the output.txt file nor the file error.txt. I also believe that the other options do not work either, appreciate your help ! It is said you should configure the torque.cfg but in my installation the file is not generated and not in /var/spool/torque.
Try "#PBS -k oe". This directs pbs to keep stdout and stderr.
I inherited a long bash script that I recently needed to modify. The bash script is run as a cronjob on a daily basis. I am decent with bash scripting, but I do not know much about Perl.
I had to substitute all "rm" commands with a call to a perl script that does something similar (for security purposes). This script was not written by me, so there is no -f flag to skip the confirmation prompt. Therefore, to automate this script I pipe "yes" to the script.
Here is an example where I am sequentially deleting two directories:
echo REMOVING FILES TO SAVE DISK SPACE
echo "yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir1>"
yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir1>
echo "yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir2>"
yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir2>
echo DONE.
In my output file, I see the following:
REMOVING FILES TO SAVE DISK SPACE
yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir1>
yes | sudo nice -n -10 perl <path_to_delete_script.pl> -dir <del_dir2>
DONE.
It does not appear that the perl script has run. Yet when I copy and paste those two commands into the terminal, they both run fine.
Any help is appreciated. Thank you in advance.
You simply put do
yes | ./myscript.pl
Thanks for all the comments. I ended up changing the group and permissions of the tool and all output files. This allowed me to run the perl script without using "sudo," which others pointed out is bad practice.
Say I submit a job using something like bsub pwd. Now I would like to get the job ID of that job in order to build a dependency for the next job. Is there some way I can get bsub to return the job ID?
Nils and Andrey have the answers to this specific question in shell and C/C++ environments respectively. For the purposes of building dependencies, you can also name your job with -J then build the dependency based on the job name:
bsub -J "job1" <cmd1>
bsub -J "job2" <cmd2>
bsub -w "done(job1) && done(job2)" <cmd>
There's a bit more info here.
This also works with job arrays:
bsub -J "ArrayA[1-10]" <cmd1>
bsub -J "ArrayB[1-10]" <cmd2>
bsub -w "done(ArrayA[3]) && done(ArrayB[5])" <cmd>
You can even do element-by-element dependency. The following job's i-th element will only run when the corresponding element in ArrayB reaches DONE status:
bsub -w "done(ArrayB[*])" -J "ArrayC[1-10]" <cmd3>
You can find more info on the various things you can specify in -w here.
Just as a reference, this is the best solution I could come up with so far. It takes advantage of the fact that bsub write a line containing the ID to STDOUT.
function nk_jobid {
output=$($*)
echo $output | head -n1 | cut -d'<' -f2 | cut -d'>' -f1
}
Usage:
jobid=$(nk_jobid bsub pwd)
In case you are using C++, you can use the lsblib, LSF C API to submit jobs. The input and the output are structs. In particular, the output struct contains the job id.
#include <lsf/lsbatch.h>
LS_LONG_INT lsb_submit (struct submit *jobSubReq, struct submitReply *jobSubReply)
$jobid = "0"
bsub pwd > $jobid
cat $jobid
If you just want to view the JOBID after submission, most of the time I will just use bhist or bhist -l to view the running jobs and details.
$ bhist
Summary of time in seconds spent in various states:
JOBID USER JOB_NAME PEND PSUSP RUN USUSP SSUSP UNKWN TOTAL
8664 F14r3 sample 2 0 187954 0 0 0 187956