In Perl is there a way to restart the program currently running from within itself? - perl

I am running a program in Perl that at one point evaluates data in an if statement called from within a subroutine, e.g.
sub check_good {
if (!good) {
# exit this subroutine
# restart program
}
else {
# keep going
}
} # end sub
The problem I have is with exiting and restarting. I know that I can just use exit 0; to exit straight out, but obviously this is not correct if I want to go back to the beginning. I tried calling the subroutine which essentially starts the program, but of course once it has run it will go back to this point again.
I thought about putting it in a while loop, but this would mean putting the whole file in the loop and it would be very impractical.
I don't actually know whether this is possible, so any input would be great.

If you have not changed #ARGV, or you keep a copy of it, you could possibly do something like exec($^X, $0, #ARGV).
$^X and $0 (or $EXECUTABLE_NAME and $PROGRAM_NAME, see Brian's comment below) are the current perl interpreter and current perl script, respectively.

An alternative would be to always have two processes: A supervisor and a worker.
Refactor all your logic into a subroutine called run(or main or whatever). Whn your real logic detect that it needs to restart it should exit with a predefined non-zero exit code (like 1 for example).
Then your main script and supervisor would look like this:
if (my $worker = fork) {
# child process
run(#ARGV);
exit 0;
}
# supervisor process
waitpid $worker;
my $status = ($? >> 8);
if ($status == 1) { ... restart .. }
exit $status; # propagate exit code...
In the simple scenario where you just want to restart once, this might be a bit overkill. But if you at any point need to be able to handle other error scenarios this method might be preferable.
For example if the exit code is 255, this indicates that the main script called die(). In this case you might want to implement some decision procedure wether to restart the script, ignore the error, or escalate the issue.
There are quite a few modules on CPAN implementing such supervisors. Proc::Launcher is one of them and the manual page includes a extensive discussion of related works. (I have never used Proc::Launcher, it is mainly due to this discussion I'm linking to it)

There's nothing to stop you calling system on yourself. Something like this (clearly in need of a tidy), where I pass in a command-line argument to prevent the code calling itself forever.
#!/usr/bin/perl
use strict;
use warnings;
print "Starting...\n";
sleep 5;
if (! #ARGV) {
print "Start myself again...\n";
system("./sleep.pl secondgo");
print "...and die now\n";
exit;
} elsif ((#ARGV) && $ARGV[0] eq "secondgo") {
print "Just going to die straightaway this time\n";
exit;
}

Related

perl - two stage conditional compilation

I have pretty big perl script executed quite frequently (from cron).
Most executions require pretty short & simple tests.
How to split single file script into two parts with "part two" compiled based on "part 1" decision?
Considered solution:
using BEGIN{ …; exit if …; } block for trivial test.
two file solution with file_1 using require to compile&execute file_2.
I would prefer single file solution to ease maintenance if the cost is reasonable.
First, you should measure how long the compilation really takes, to see if this "optimization" is even necessary. If it does happen to be, then since you said you'd prefer a one-file solution, one possible solution is using the __DATA__ section for code like so:
use warnings;
use strict;
# measure compliation and execution time
use Time::HiRes qw/ gettimeofday tv_interval /;
my $start;
BEGIN { $start = [gettimeofday] }
INIT { printf "%.06f\n", tv_interval($start) }
END { printf "%.06f\n", tv_interval($start) }
my $condition = 1; # dummy for testing
# conditionally compile and run the code in the DATA section
if ($condition) {
eval do { local $/; <DATA>.'; 1' } or die $#;
}
__DATA__
# ... lots of code here ...
I see two ways of achieving what you want. The simple one would be to divide the script in two parts. The first part will do the simple tests. Then, if you need to do more complicated tests you may "add" the second part. The way to do this is using eval like this:
<first-script.pl>
...
eval `cat second-script.pl`;
if( $# ) {
print STDERR $#, "\n";
die "Errors in the second script.\n";
}
Or using File::Slurp in a more robust way:
eval read_file("second-script.pl", binmode => ':utf8');
Or following #amon suggestion and do:
do "second-script.pl";
Only beware that do is different from eval in this way:
It also differs in that code evaluated with do FILE cannot see lexicals in the enclosing scope; eval STRING does. It's the same, however, in that it does reparse the file every time you call it, so you probably don't want to do this inside a loop.
The eval will execute in the context of the first script, so any variables or initializations will be available to that code.
Related to this, there is this question: Best way to add dynamic code to a perl application, which I asked some time ago (and answered myself with the help of the comments provided and some research.) I took some time to document everything I could think of for anyone (and myself) to refer to.
The second way I see would be to turn your testing script into a daemon and have the crontab bit call this daemon as necessary. The daemon remains alive so any data structures that you may need will remain in memory. On the down side, this will take resources in a continuos way as the daemon process will always be running.

Perl crashing with Parallel::ForkManager and WWW::Mechanize

I have written a Perl Script using WWW::Mechanize which reads URLs from a text file and connects to them one by one. In each operation, it parses the content of the webpage looking for some specific keywords and if found, it will be written to the output file.
To speed up the process, I used Parallel::ForkManager with MAX_CHILDREN set to 3. Though I have observed an increase in the speed, the problem is that, after a while the script crashes. Perl.exe process gets killed and it does not display any specific error message.
I have run the script multiple times to see if it always fails at the same point, however the point of failure seems to be intermittent.
Please note that I have already taken care of any memory leaks in WWW::Mechanize and HTML::TreeBuilder::XPath as follows:
For WWW::Mechanize, I set stack_depth(0) so that it does not cache the history of visited pages.
HTML::TreeBuilder::XPath, I delete the root node once I am done with it. This approach helped me in resolving a memory leak issue in another similar script which does not use fork.
Here is the structure of the script, I have mentioned only the relevant parts here, please let me know if more details are required to troubleshoot:
#! /usr/bin/perl
use HTML::TreeBuilder::XPath;
use WWW::Mechanize;
use warnings;
use diagnostics;
use constant MAX_CHILDREN => 3;
open(INPUT,"<",$input) || die("Couldn't read from the file, $input with error: $!\n");
open(OUTPUT, ">>", $output) || die("Couldn't open the file, $output with error: $!\n");
$pm = Parallel::ForkManager->new(MAX_CHILDREN);
$mech=WWW::Mechanize->new();
$mech->stack_depth(0);
while(<INPUT>)
{
chomp $_;
$url=$_;
$pm->start() and next;
$mech->get($url);
if($mech->success)
{
$tree=HTML::TreeBuilder::XPath->new();
$tree->parse($mech->content);
# do some processing here on the content and print the results to OUTPUT file
# once done then delete the root node
$tree->delete();
}
$pm->finish();
print "Child Processing finished\n"; # it never reaches this point!
}
$pm->wait_all_children;
I would like to know, why does this Perl script keep failing after a while?
For understanding purpose, I added a print statement right after the finish method of fork manager, however it does not print that.
I have also used, wait_all_children method, since as per the document of the module on CPAN, it will wait for the processing to get over for all the children of the parent process.
I have not understood why, wait_all_children method is place outside the while or the for loop though (as observed in the documentation as well), since all the processing is taking place inside the loop.
Thanks.
As for why this code is written with a main job loop with the start and finish calls and then followed by a wait_all_children outside the loop. It works like this:
The parent process gets the next job from <INPUT> at the start of each loop.
The parent runs start, which causes the child process to fork. At this point, you have 2 processes, each of which is running the exact same code at the exact same point.
3a. The parent process hits that or next and jumps back to the top to read the next <INPUT> and start the process over.
3b. The child process does not hit the or next and continues running the code you give until it hits finish, where the child exits.
Meanwhile the parent process is busy going through the loop and creating a child each time through. After forking 3 children (or whatever you set the limit to) it blocks until one of the children exit. At which point, it immediately spawns a new child (resulting in step 3b for each child each time).
When the parent runs out of jobs, it jumps out the while loop (never having run anything in it itself) and then waits for all the remaining children to exit.
As you can see, any code in the loop after finish is called is never going to run in either the parent (because it doesn't do anything after or next within the loop) or the children (because they exit at finish).
I've never used Parallel::ForkManager, but it looks like you can put a run_on_finished hook to run some code at the finish if you want to put a print statement there at the end.
To find the problem, though, I'd suggest wrapping all the code between start and finish in an eval or use Try::Tiny and warn out the error to see if there's an exception happening in there that's breaking it. I'd expect such things to show up in STDERR when the child dies, though, so I'm not really sure that will help.
However, it's worth shot. Here's my suggestion in code, just showing the part I'd catch exceptions from:
# At the top add
use Try::Tiny;
# Later in your main loop
$pm->start() and next;
try {
$mech->get($url);
if($mech->success)
{
$tree=HTML::TreeBuilder::XPath->new();
$tree->parse($mech->content);
# do some processing here on the content and print the results to OUTPUT file
# once done then delete the root node
$tree->delete();
}
}
catch {
warn "Bad Stuff: ", $_;
};
$pm->finish();
That might help show you what's gone wrong.
If it does not help, you might try moving the try block to include more of the program (like nearly all of it after the use Try::Tiny line) and see if that elucidates anything.
The $pm->wait_all_children; function call waits for "ALL" the child processes to end and places a Blocking lock. I am not sure what kind of error handling you have done for $mech inside the if() statement, but you may want to re-visit that.

Force Perl to call END subroutines when ending with exec()?

When you use exec() in Perl:
Note that exec will not call your END blocks, nor will it invoke DESTROY methods on your objects.
How do I force perl to call END blocks anyway? Can I do something like END(); exec($0) or whatever?
I really am trying to make the program end its current instance and start a brand new instance of itself, and am too lazy to do this correctly (using cron or putting the entire program in an infinite loop). However, my END subroutines cleanup temp files and other important things, so I need them to run between executions.
Unhelpful links to code:
https://github.com/barrycarter/bcapps/blob/master/bc-metar-db.pl
https://github.com/barrycarter/bcapps/blob/master/bc-voronoi-temperature.pl
https://github.com/barrycarter/bcapps/blob/master/bc-delaunay-temperature.pl
So you're trying to execute a program within your script? exec probably isn't what you want then. exec behaves like the C exec: what gets called replaces your current process; to keep going, you'd have to do something like a fork to preserve your current process while executing another.
But good news! That all exists in the system builtin.
Does exactly the same thing as exec LIST , except that a fork is done first and the parent process waits for the child process to exit.
Here's what it looks like:
use 5.012; # or use 5.012 or newer
use warnings;
... # some part of my program
system($my_command, $arg1, $arg2); # forks, execs, returns.
END {
# still gets called because you never left the script.
}
If you absolutely must use an exec, you must call your cleanup routine automatically. To understand more about END, see perldoc perlmod for full details. The short of it: END is one of several types of blocks of code that gets execucted at a particular stage in the execution of the script. They are NOT subroutines. However, you can execute any code you want in those subroutines. So you can do:
sub cleanup { ... } # your cleanup code
sub do_exec {
cleanup();
exec( ... );
}
END {
cleanup();
}
and then you know your cleanup code will be executed at either script exit OR when you do your exec.
To answer the narrow question of how to invoke your END blocks at arbitrary times, you can use the B::end_av method with B::SV::object_2svref to get the code references to your END blocks.
sub invoke_end_blocks_before_exec {
use B;
my #ENDS = B::end_av->ARRAY;
foreach my $END (#ENDS) {
$END->object_2svref->();
}
}
END { print "END BLOCK 1\n" }
END { print "END BLOCK 2\n" }
...
invoke_end_blocks_before_exec();
exec("echo leave this program and never come back");
Output:
END BLOCK 2
END BLOCK 1
leave this program and never come back
I would usually prefer something less magical, though. Why not a structure like
sub cleanup { ... }
END { &cleanup }
if (need_to_exec()) {
cleanup(); # same thing END was going to do anyway
exec( ... );
}
?
Fork and exec
It'll leave you with a new pid, but you could do a fork/exec:
my $pid = fork();
defined $pid or die "fork failed";
exit if $pid; # parent immediately exits, calling END blocks.
exec($0) or die "exec failed"; # child immediately execs, will not call END blocks (but parent did, so OK)
This strikes me as far less fragile than mucking with internals or trying to make sure your exec is in the final END block.
Wrap your program
Also, it is trivial to just wrap your Perl program in a shell (or Perl) script that looks something like this:
#!/bin/sh
while sleep 5m; do
perl your-program.pl
done
or
#!/usr/bin/perl
while (1) {
system("perl your-program.pl");
sleep(5*60);
}
Can you put your call to exec in at the end of the (final) END block? Where your current call to exec is, set a flag, then exit. At the end of the END block, check the flag, and if it's true, call exec there. This way, you can exit your script without restarting, if necessary, and still have the END blocks execute.
That said, I'd recommend not implementing this type of process-level tail recursion.

Terminating a system() after certain amount of time in Windows

I'm running a command line application from within the perl script(using system()) that sometimes doesn't return, to be precise it throws exception which requires the user input to abort the application. This script is used for automated testing of the application I'm running using the system() command. Since, it is a part of automated testing, sytem() command has to return if the exception occurs and consider the test to be fail.
I want to write a piece of code that runs this application and if exception occurs it has to continue with the script considering the this test to be failed.
One way to do this is to run the application for certain period of time and if the system call doesn't return in that period of time we should terminate the system() and continue with the script.
(How can I terminate a system command with alarm in Perl?)
code for achieving this:
my #output;
eval {
local $SIG{ALRM} = sub { die "Timeout\n" };
alarm 60;
return = system("testapp.exe");
alarm 0;
};
if ($#) {
print "Test Failed";
} else {
#compare the returned value with expected
}
but this code doesn't work on windows i did some research on this and found out that SIG doesn't work for windows(book programming Perl).
could some one suggest how could I achieve this in windows?
I would recommend looking at the Win32::Process module. It allows you to start a process, wait on it for some variable amount of time, and even kill it if necessary. Based on the example the documentation provides, it looks quite easy:
use Win32::Process;
use Win32;
sub ErrorReport{
print Win32::FormatMessage( Win32::GetLastError() );
}
Win32::Process::Create($ProcessObj,
"C:\\path\\to\\testapp.exe",
"",
0,
NORMAL_PRIORITY_CLASS,
".")|| die ErrorReport();
if($ProcessObj->Wait(60000)) # Timeout is in milliseconds
{
# Wait succeeded (process completed within the timeout value)
}
else
{
# Timeout expired. $! is set to WAIT_FAILED in this case
}
You could also sleep for the appropriate number of seconds and use the kill method in this module. I'm not exactly sure if the NORMAL_PRIORITY_CLASS creation flag is the one you want to use; the documentation for this module is pretty bad. I see some examples using the DETACHED_PROCESS flag. You'll have to play around with that part to see what works.
See Proc::Background, it abstracts the code for both win32 and linux, the function is timeout_system( $seconds, $command, $arg, $arg, $arg )

Killing an application started using system() in Perl

I am trying to run an application inside a Perl script using system(). The application I'm running gets stuck sometimes (it enters some kind of infinite loop). Is there a way I can know if this application is stuck and kill it to continue with the Perl script?
I'm trying to do something like this:
start testapp.exe;
if(stuck with testapp.exe) {
kill testapp.exe;
}
Determining if "it is stuck in infinite loop" is called Halting Problem and is undecidable.
If you want to kill it, you will have to fork the application using fork and then kill it from the other fork, if it is going for too long.
You can determine if the proccess is going for too long by this
use POSIX ":sys_wait_h";
waitpid($pid, WNOHANG)>0 #waitpid returns 0 if it still running
at least, according to this manual page
I am not sure how well it works on various systems, you can try it out.
Not a direct answer, but I can recommend using forks module if you want to fork with ease, but it works only on UNIX systems (not windows).
OK, more helping code :) It works in UNIX, according to perlfork perldoc, it should work on Windows exactly the same way.
use warnings;
use strict;
use POSIX ":sys_wait_h";
my $exited_cleanly; #to this variable I will save the info about exiting
my $pid = fork;
if (!$pid) {
system("anything_long.exe"); #your long program
} else {
sleep 10; #wait 10 seconds (can be longer)
my $result = waitpid(-1, WNOHANG); #here will be the result
if ($result==0) { #system is still running
$exited_cleanly = 0; #I already know I had to kill it
kill('TERM', $pid); #kill it with TERM ("cleaner") first
sleep(1); #wait a bit if it ends
my $result_term = waitpid(-1, WNOHANG);
#did it end?
if ($result_term == 0) { #if it still didnt...
kill('KILL', $pid); #kill it with full force!
}
} else {
$exited_cleanly = 1; #it exited cleanly
}
}
#you can now say something to the user, for example
if (!$exited_cleanly) {...}
system("start testapp")
is short for
system("cmd", "/c", "start testapp")
Perl just knows about cmd; it doesn't know anything about start, much less about testapp. system is not the tool you want. That's the first problem.
The second problem is that you haven't defined what it means to be "stuck". If you want to monitor a program, it needs a heartbeat. A heartbeat is a periodic activity that can be externally examined. It can be writing to a pipe. It can be changing a file. Anything.
The monitoring program listens for this heartbeat, and presumes the program is dead if the heart stops beating, so to speak.
"Killing" is done using signals in unix, but it's done using TerminateProcess in Windows. The third problem is that Perl core does not give you access to that function.
The solution to the first and third problem is Win32::Process. It allows you to launch a process in the background, and it also allows you to terminate it.
Creating a heartbeat is up to you.
Here is one way you can handle the problem if you know that testapp should not take more than N seconds to do its thing, then you can use a timeout to kill the app by way of IPC::Run.
In the example below there is a timeout of 1 second which kills the sleep 10 command that takes too long (longer than the timeout of 1 second). If this doesn't do what you want, then you should provide more information about how you can detect that testapp.exe is "stuck".
#!/usr/bin/env perl
use IPC::Run qw( run timeout );
eval { # if (stuck with testapp.exe for more than N seconds)
#cmd = ('sleep', '10'); # this could be testapp.exe instead of sleep
run \#cmd, \$in, \$out, \$err, timeout( 1 ) or die "test"; # start testapp.exe
print "do stuff if cmd succeeds\n";
};
print "more stuff to do afterwards whether or not command fails or succeeds\n";
You can't determine that the application is stuck if you execute it like that, because the system statement won't return until the application terminates.
So, at least, you need to start the test application so it can run asynchronously from the Perl script that is to monitor it.
Having resolved that part of the problem, you have to establish a mechanism that will allow the monitoring Perl script to determine that the application is stuck. That is a non-trivial exercise, and likely system dependent, unless you adopt a simple expedient such as requiring the application to write a heart-beat indication somewhere, and the Perl script monitors for the heart-beat. For example (not necessarily a good example), the application could write the current time into a file identified by its PID, and the Perl script could monitor the file to see if the heart-beat is sufficiently recent. Of course, this assumes that the 'infinite loop' doesn't include code that writes to the heart-beat file.