I have got a program that will run for a very long time on my universities LSF cluster. I don't know if it will finish before it exceeds its job's time limit. If a job exceeds the time limit, the LSF system will send increasingly unfriendly termination signals to the program before it is finally killed. I programmed the code to catch the USR2 signal and save its data, however this will need a few minutes. In my university's guide to using the LSF system, it states that the option
-ta USR2 -wt [hh:]mm
extends the time limit the program has to react to USR2.
I already tried to following options:
-ta USR2 -wt '00:20'
-ta USR2 -wt 00:20
-ta USR2 -wt 20
-ta USR2 -wt '20'
and all of the above where
USR2
is replaced by
'USR2'
I hoped that the job would be submitted, but there is an error occuring:
a: Bad time specification. Job not submitted.
I think that you want
-wa USR2 -wt 20
-ta isn't a bsub option. So bsub thinks you're asking for a termination deadline -t with a time spec of a. Hence the error message
a: Bad time specification. Job not submitted.
Related
I am looking for some support on creating some way of running a swift command in terminal to run a program and then stop it after 1 hour then restart.
Example of manual process:
Open Termain.
cd my app
swift run my program --with-parameters
ctrl+c (after 1 hours)
Restart with step 3
I am sure there must be some way using a bash script maybe to start the program by command, kill it after 60min and restart it with a continuous loop like that.
Thanks :-)
You can set up a cron job to do this. Basically, you'll have a bash script, say it's located at /Users/ben/scripts/run_my_program.sh that will, at every hour:
Terminate the current running process (kill pid)
Execute the swift run my program --with-parameters and spit out the process ID
you can get the PID of the swift process you launch with echo $!, and then use sleep 1h to sleep for 1 hour and then kill the process with kill -9 and the PID you got in the first step.
I have a periodic Celery task in which I make quite a bit of printing to stdout.
For example:
print(f"Organization {uuid} updated")
All of these print statements look like this in my worker output:
[2019-10-31 10:36:00,466: DEBUG/MainProcess] basic.qos: prefetch_count->102
The counter at the end is incremented for each print. Why is this? What would I have to change to see stdout?
I run the worker as such:
$ celery -A project worker --purge -l DEBUG
You do not call print() in your tasks. Instead, you have something like: logger = get_task_logger(__name__) and use logger in your tasks when you need to write something to the log at certain level.
I'm using PBS Torque to run multiple jobs. The idea is simple, each job works on a shunk of data. The PBS_Torque job_script to launch a job is called run_appli.sh
Here is the simple code (code 1) to launch 10 jobs
for i in 1 2 3 4 5 6 7 9 10 do; qsub run_appli.sh ; done
Indeed, I can monitor the execution of each of those jobs using qstat (see the command below) and have elapsed time of each job.
watch -n1 -d `qstat`
However, I am interested by the overall elapsed time. That means the time starting from when I launched all the jobs (code 1) and when the last job finished its execution.
Does anyone have an idea on how to do this ?
If you know the job id of the first job, you can look at it's ctime (creation time, or the time it is queued). You can then check the end time for the last job's comp_time. The difference between the two would be the total time elapsed.
qstat -f $first_job_id | grep ctime # Shows the first job's queued time
qstat -f $last_job_id | grep comp_time # Shows the final job's completion time.
If the last job's isn't completed, then the running elapsed time would just be the current time - the first job's queue time.
I'm attempting to write a manager in Perl to automate a bioinformatics pipeline my lab has been using. (The REPET pipeline, for anyone who's interested.) The pipeline has eight steps, several of which are broken down into substeps which can be run in parallel. Most notably, step 3 is broken down into three parts, and step 4 into three corresponding parts. Each part of step 3 can be run independently, and its corresponding part in step 4 can be started as soon as its step 3 companion is finished. I'd like my manager to be able to launch step 3 in three parallel threads, and, for each thread, move on to step 4 as soon as step 3 is finished. The best way I can think to do that is to monitor the output of each process. The output from each step looks like this:
START TEdenovo.py (2012-08-23 11:20:10)
version 2.0
project name = dm3_chr2L
project directory = /home/<etc>
beginning of step 1
submitting job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
waiting for 1 job(s) with groupid 'dm3_chr2L_TEdenovo_prepareBatches' (2012-08-23 11:20:10)
execution time per job: n=1 mean=2.995 var=0.000 sd=0.000 min=2.995 med=2.995 max=2.995
step 1 finished successfully
version 2.0
END TEdenovo.py (2012-08-23 11:20:25)
That's the output for step 1, but in step 3, when "step 3 finished successfully" appears in the output, it's safe to move on to step 4. The problem has been successfully tabulating the output for three of these processes as they run at once. Essentially, this is the behavior that I want (pseudocode):
my $log31 = `TEdenovo.py [options] &`;
my $log32 = `TEdenovo.py [options] &`;
my $log33 = `TEdenovo.py [options] &`;
while(1) {
#start step 41 if $log31 =~ /step 3 finished successfully/;
#start step 42 if $log32 =~ /step 3 finished successfully/;
#start step 43 if $log33 =~ /step 3 finished successfully/;
#monitor logs 41, 42, 43 similarly
last if #all logs read "finished successfully"
sleep(5);
}
#move on to step 5
The problem is that evoking a process with backticks causes perl to wait until that process has finished to move on; as I discovered, it isn't like with system(), where you can spin something into a background process with & and then proceed immediately. As far as I know, there isn't a good way to use system() to get the effect I'm looking for. I suppose I could do this:
system("TEdenovo.py [options] & > log31.txt");
And then poll log31.txt periodically to see whether "finished successfully" has appeared, but that seems unecessarily messy.
I've also tried opening the process in a filehandle:
open(my $step3, "TEdenovo.py [options] |");
my #log3;
while(1)
{
push(#log3, <$step3>);
last if grep("step 3 finished successfully", #log3);
sleep(5);
}
...but, once again, Perl waits until the process has finished in order to move on (in this case, at the push()). I tried the above with $| both set and unset.
So, the essence of my question is: Is there a way to capture the standard output of a running background process in perl?
maybe you could try
open(my $step3, "TEdenovo.py [options] |");
while(<$step3>)
{
last if /step 3 finished successfully/;
}
instead of while(1) ?
The approach of using open and reading from the pipehandle is the a correct approach. If Nahuel's suggestion of reading from the handle in scalar context doesn't help, you could still be suffering from buffering.
$| changes the buffering behavior of Perl's output, but not the behavior of any external programs called from Perl. You have to use an external program that doesn't buffer its output. In this case, I believe this is possible by passing the -u option to python:
open(my $step3, "|-", "python -u TEdenovo.py [more options]");
I have a simple Perl script that simply prints a line of text to stdout. What I want to accomplish is that while this script runs, if I (or someone else) issues a signal to that process to stop, I want it to trap that signal and exit cleanly. The code I have looks like the following
#!/usr/bin/perl -w
$| = 1;
use sigtrap 'handler' => \&sigtrap, 'HUP', 'INT','ABRT','QUIT','TERM';
while(1){
print "Working...\n";
sleep(2);
}
sub sigtrap(){
print "Caught a signal\n";
exit(1);
}
While this works well when I actually hit ctrl-c from the command line, if I issue a
kill -9 <pid>
It just dies. How do I get it to execute something before exiting? My general idea is to use this framework to capture when this script dies on a server due to a server reboot for maintenance or failure.
Thanks much in advance
Signal #9 (SIGKILL) can not be trapped. That's how Unix is designed.
But the system does not send that signal when shutting down for maintainance. At least if your daemon behaves correctly. It will normally send the TERM signal (or more exactly what your daemon handling script in /etc/init.d does). Only processes that do not correctly shutdown after a timeout will receive SIGKILL.
So your aim should be to correctly handle the TERM signal and to write the wrapper script in /etc/init.d that will be called when the system is changing runlevel.
Update: You can use the Daemon::Control module for the init script.
You're sending two very different signals to your process. Pressing Ctrl-C in console usually sends the process a TERMINT signal, which - judging by your code - is caught and serviced. kill -9, though, sends signal number 9 explicitly, which is called KILL. This is one of the signals whose servicing cannot be redefined and delivery of this signal always immediately ends the process, which is done by the kernel itself.
As far as I know you can't capture kill -9. Try kill <pid> instead.