Under what circumstances are END blocks skipped in Perl? - perl

I have a long-running program that used File::Temp::tempdir to create a temporary file and sometimes interrupted it via ^C.
The following program prints the name of the temporary directory it creates and the name of a file in it.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
exit;
On OS X, this creates a directory inside /var/folders
If the last line is exit; or die;, then the folder will get cleaned up and the temporary file inside it will get deleted.
However, if we replace the last line with sleep 20; and then interrupt the perl program via ^C, the temporary directory remains.
% perl maketemp.pl
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
^C
% stat /var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
16777220 6589054 -rw-r--r-- 1 <name> staff 0 0 "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" "Aug 1 20:46:27 2016" 4096 0 0
/var/folders/dr/cg4fl5m11vg3jfxny3ldfplc0000gn/T/ycilyLSFs6/temp.txt
%
using a signal handler that just calls exit; does clean up the directory. E.g.
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
$SIG{INT} = sub { exit; };
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
sleep 20;
As does using a "trivial" signal handler
#!/usr/bin/env perl
use strict;
use warnings;
use File::Temp qw[tempdir];
$SIG{INT} = sub { };
my $dir = tempdir(CLEANUP => 1);
print "$dir\n";
print "$dir/temp.txt\n";
`touch $dir/temp.txt`;
sleep 20;
I tried looking through the source code (https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm) to determine how tempdir is registering a cleanup action
Here's the exit handler installation
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L1716
which calls _deferred_unlink
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L948
which modified the global hashes %dirs_to_unlink and %files_to_unlink, but uses the pid $$ as a key for some reason (probably in case the Perl interpreter forks? Not sure why that's necessary though since removing a directory seems like it would be an idempotent operation.)
The actual logic to clean up the files is here, in the END block.
https://github.com/Perl-Toolchain-Gang/File-Temp/blob/master/lib/File/Temp.pm#L878
A quick experiment shows that END blocks are indeed run when perl has exited normally or abnormally.
sleep 20;
END {
print "5\n";
}
# does not print 5 when interrupted
And are run here
$SIG{INT} = sub {};
sleep 20;
END {
print "5\n";
}
# does print 5 when interrupted
So ... why does the END block get skipped after a SIGINT unless there's a signal handler, even one that seems like it should do nothing?

By default, SIGINT kills the process[1]. By kill, I mean the process is immediately terminated by the kernel. The process doesn't get to perform any cleanup.
By setting a handler for SIGINT, you override this behaviour. Instead of killing the process, the signal handler is called. It might not do anything, but its very existence prevented the process from being killed. In this situation, the program won't exit as a result of the signal unless it chooses to exit (by calling die or exit in the handler. If it does, it would get a chance to cleanup as normal.
Note that if a signal for which a handler was defined comes in during a system call, the system call exits with error EINTR in order to allow the program to safely handle the signal. This is why sleep returns as soon as SIGINT is received.
If instead you had used $SIG{INT} = 'IGNORE';, the signal would have been completely ignored. Any systems calls in progress won't be interrupted.
On my system, man 1 kill lists the default actions of signals.

Your signal handler $SIG{INT} = sub {} isn't doing nothing, it is trapping the signal and preventing the program from exiting.
But to answer your original question, END blocks, as perlmod says:
is executed as late as possible, that is, after perl has finished running the program and just before the interpreter is being exited, even if it is exiting as a result of a die() function. (But not if it's morphing into another program via exec, or being blown out of the water by a signal--you have to trap that yourself (if you can).)
That is, a fatal signal, if not trapped, circumvents Perl's global destruction and does not call END blocks.

Related

How does Perl know how many bytes to read in a handle from IO::Select::->can_read?

I'm using IO::Select's can_read method to select file handles that are ready for reading in a simple perl script.
However, the <...> operator on filehandles does not require a length to be passed to it.
Is IO::Select reaching inside the filehandle to set the "appropriate length" ... or what exactly is happening?
#!/usr/bin/env perl
use IO::Select;
use strict;
use warnings;
my #handles = IO::Select->new(\*STDIN)->can_read(3);
#handles == 1 or die;
my $handle = $handles[0];
print ("I read " . <$handle> . "\n");
For instance, the following script prints "a\n" immediately and then exits after three seconds.
% sh -c 'echo a; sleep 5; echo b' | perl reader.pl
I read a
Exit 141
It then exits abnormally for some strange reason ... not sure where the exit status is being set.
EDIT: the apparent abnormal exit appears to be a bug in tcsh.
Neither IO::Select knows the length nor the <...> operator or readline function knows it. Instead it will just try to read what is there until the end of the line. If no end of line character is found (i.e. $/) it will simply return all available data in case of non-blocking file handle or wait until end of line or end of data in case of a blocking file handle.
In your specific case echo a; sleep 5; echo b results in a line a\n and 5 seconds later in a line b\n. Since your code uses <..> in scalar context it will only read a single line. This means it will stop after the first line end was found and return this line, resulting in a\n.

ForkManager SIGINT only kills current process in fork

I want to have all child processes die when I kill a perl process that is using ForkManager. In the code below, if I run it and hit ctrl+c while the sleep line is running, the sleep process is killed, but the print lines are then all simultaneously executed before the script ends. Ideally, I'd like an interrupt to immediately stop all execution. What can I do?
#!/usr/bin/perl -w
use Parallel::ForkManager;
main {
my $fork1 = new Parallel::ForkManager(8);
while (1) {
$fork1->start and next;
system("sleep 15s");
print "Still going!"
$fork1->finish;
}
fork1->wait_all_children;
}
According to perldoc system, system actually ignores both SIGINT and SIGQUIT:
Since SIGINT and SIGQUIT are ignored during the execution of system,
if you expect your program to terminate on receipt of these signals
you will need to arrange to do so yourself based on the return value.
So if you want your processes to stop executing if you SIGINT during the system call, you need to implement that logic yourself:
#!/usr/bin/perl -w
use Parallel::ForkManager;
main {
my $fork1 = new Parallel::ForkManager(8);
while (1) {
$fork1->start and next;
print "Sleeping...";
system("sleep 15s") == 0 or exit($?);
print "Still going!";
$fork1->finish;
}
fork1->wait_all_children;
}
OR the more reasonable approach is the use the Perl built-in sleep:
#!/usr/bin/perl -w
use Parallel::ForkManager;
main {
my $fork1 = new Parallel::ForkManager(8);
while (1) {
$fork1->start and next;
print "Sleeping...";
sleep 15;
print "Still going!";
$fork1->finish;
}
fork1->wait_all_children;
}
First off - using system means you might have something strange happen, because ... then you're allowing whatever you're calling to do stuff to handle signals by itself.
That may be your problem.
However otherwise, what you can do with perl is configure signal handlers - what to do if a signal is recieved by this process. By default - signals are either set to 'exit' or 'ignore'.
You can see what this is currently via print Dumper \%SIG;
However the simplest solution to you problem I think, would be to set a handler to trap SIGINT and then send a kill to your current process group.
The behavior of kill when a PROCESS number is zero or negative depends on the operating system. For example, on POSIX-conforming systems, zero will signal the current process group, -1 will signal all processes, and any other negative PROCESS number will act as a negative signal number and kill the entire process group specified.
$SIG{'INT'} = sub {
kill ( 'TERM', -$$ );
};

Cleanup tmp dirs when hitting CTRL-C from perl -d debug session

For the temp directory I need in my script, I go with:
my $work_dir = File::Temp->newdir(TEMPLATE => "/tmp/work.$$.XXXX" ) or die "Cannot create tempdir directory $!";
My hope with newdir() is to get the promise of:
By default the directory is deleted when the object goes out of scope.
Only to learn that if I hit CTRL-C, $work_dir will NOT be removed.
So I added signals:
use sigtrap qw(handler signal_handler normal-signals);
and then I simply use (File::Path 'remove_tree');
sub signal_handler
{
remove_tree $work_dir;
}
This helps if I hit CTRL-C while my script runs...
However, it does NOT clean up while using the debugger, if I CTRL-C out of the debugger! If I quit out cleanly (with the q command) then it works/cleans fine, only if I CTRL-C out of the debug session, that's when $work_dir is not being removed.
Is it possible, in any way, to have the signal handler being called automatically even within a perl debug session ?
(or any other "proper" ways how to use/install signal handlers)
Your signal handler isn't doing what you think it does, because passing an object to remove_tree doesn't work:
use strict;
use warnings;
use 5.010;
use File::Path qw(remove_tree);
use File::Temp;
my $tmpdir = File::Temp->newdir(CLEANUP => 0);
remove_tree $tmpdir;
say "$tmpdir still exists" if -d $tmpdir;
Outputs:
/tmp/lTfotn79RD still exists
The call to remove_tree in your signal handler seems to work when run outside of the debugger, but it's actually not doing anything. (You can prove this to yourself by commenting out the call to remove_tree and re-running your script.) So why does the directory get removed?
If a signal handler doesn't exit or die, execution continues wherever it left off before the signal was caught. In your case, after the signal handler finishes, the program simply runs to completion. When the program terminates, any objects that are still in scope are cleaned up by calling their DESTROY methods. File::Temp->newdir returns a File::Temp::Dir object; this object's DESTROY method is what actually removes the directory from the filesystem (it uses rmtree under the hood).
This doesn't work when you interrupt the debugger; I'm not familiar with the debugger's internals, but I'm guessing it keeps references to objects so that DESTROY isn't called, even when you step off the end of the program. If you Ctrl+C again at this point, the object is never cleaned up, and neither is the temporary directory.
One way I found to work around this is to explicitly undef the object returned by File::Temp->newdir in your signal handler:
use strict;
use warnings;
use 5.010;
use File::Temp;
use sigtrap qw(handler cleanup normal-signals);
my $tmpdir = File::Temp->newdir;
sub cleanup {
my ($sig) = #_;
say "Caught signal SIG$sig";
undef $tmpdir;
exit 0;
}
This causes the DESTROY method to be called before the program exits, so you're not relying on the debugger to clean up. This seems like kind of a hack, though; why not just quit the debugger gracefully with q?
Note that you could also pass the stringified version of $tmpdir to remove_tree like this:
remove_tree "$tmpdir";
But I wouldn't recommend this, since the documentation strongly cautions against relying on file names:
For maximum security, endeavour always to avoid ever looking at, touching, or even imputing the existence of the filename. You do not know that that filename is connected to the same file as the handle you have, and attempts to check this can only trigger more race conditions. It's far more secure to use the filehandle alone and dispense with the filename altogether.
I like to use an END block. The any clean exit from the program, especially a 'quit' from the debugger will trigger the END block an, in my case, delete all my test data.
So put your clean up code in and END block have have you sig handler call exit() instead of remove_tree.
END {
remove_tree $work_dir;
}
sub signal_handler
{
exit();
}

perl: can end block be called when program is 'kill'

BEGIN {
while (1) {
print "hi\n";
}
}
END {
print "end is called\n";
}
in shell:
kill <pid>
OUTPUT:
hi
hi
hi
hi
hi
hi
hi
hi
hi
Terminated
The end block didnt get called when i killed it via kill or ctrl-c.
Is there something equivalent that will always get called before program exits
Ctrl C sends a SIGINT to your program. You can 'catch' this with a signal handler by setting the appropriate entry in %SIG. I would note - I don't see why you're using BEGIN that way. BEGIN is a special code block that's called at compile time - at the very first opportunity. That means it's triggered when you run perl -c to validate your code, and as such is really a bad idea to set as an infinite loop. See: perlmod
E.g.
#!/usr/bin/perl
use strict;
use warnings;
$SIG{'INT'} = \&handle_kill;
my $finished = 0;
sub handle_kill {
print "Caught a kill signal\n";
$finished++;
}
while ( not $finished ) {
print "Not finished yet\n";
sleep 1;
}
END {
print "end is called\n";
}
But there's a drawback - some signals you can't trap in this way. See perlipc for more details.
Some signals can be neither trapped nor ignored, such as the KILL and STOP (but not the TSTP) signals. Note that ignoring signals makes them disappear. If you only want them blocked temporarily without them getting lost you'll have to use POSIX' sigprocmask.
By default if you send a kill, then it'll send a SIGTERM. So you may want to override this handler too. However it's typically considered bad to do anything other than exit gracefully with a SIGTERM - it's more acceptable to 'do something' and resume when trapping SIGHUP (Hangup) and SIGINT.
You should note that Perl does 'safe signals' though - and so some system calls won't be interrupted, perl will wait for it to return before processing the signal. That's because bad things can happen if you abort certain operations (like close on a file where you're flushing data might leave it corrupt). Usually that's not a problem, but it's something to be aware of.
put the proper signal handler in your code:
$SIG{INT} = sub { die "Caught a sigint $!" };
the control-c sends the SIGINT signal to the script, who is catched by this handler

Perl Detecting DESTROY as result of die()

I have an object which I do lots of stuff to, and then finish up by calling a TidyUp() method which displays some stats about what happened.
As part of internal testing, I wanted to detect when these objects were being created, but not tidied up before the program exited, and in that case die.
So I added a flag to the object which is set in TidyUP(), and a DESTROY sub-routine for the class that dies if this flag isn't set.
This all works fine, but it has an unfortunate side-effect. If something else goes wrong in the program, which causes die to be called, then the global destruction fires off my DESTROY, which notices that the object hasn't been tidied and dies. The problem is that I lose the original die message at this point.
So I'd like to detect whether the destruction occurred as a result of a die, or just normal program exit, and only in the latter case add my new die.
I had a look at Detecting global destruction in Perl but I don't think that helps as I want to detected what triggered global destruction.
You could set a global flag before you check whether your objects are tidied up. Then you know in which phase your program is running. You could also try to register a callback when your program dies using $SIG{__DIE__}. Checking $? is not safe since it could be set by other means. Checking whether you're in global destruction should work, too. But the cleanest solution would be to store all objects that have to be tidied up in an extra array. Then you can loop over these objects whenever you want and do away with using DESTROY.
From playing around with this problem, I see that the call stack in the DESTROY method is slightly different depending on how the program exits. Maybe that could help:
package Foo;
sub new { bless [], __PACKAGE__ }
sub DESTROY {
my ($n,$pkg,$file,$line);
while (my #c=caller($n++)) {
($pkg,$file,$line) = #c;
}
print STDERR "DESTROY called via $file line $line\n";
}
my $foo = Foo->new;
if ($ARGV[0] eq 'die') { die } # line 11
if ($ARGV[0] eq 'exit') { exit } # line 12
# else passive exit
$ perl 14255585.pl die
Died at 14255585.pl line 11.
DESTROY called via 14255585.pl line 11
$ perl 14255585.pl exit
DESTROY called via 14255585.pl line 12
$ perl 14255585.pl foo
DESTROY called via 14255585.pl line 0
If the list of exit points in your program is small and well-defined, you could just enumerate them and handle them when the program ends. Otherwise, you could do some on-the-fly static code analysis to see what the likely cause of death is.
You can hook into $SIG{__DIE__}, the global exception handler:
#!/usr/bin/perl
use Modern::Perl;
package Foo;
my $DIED = 0;
BEGIN { $SIG{__DIE__} = sub { $DIED = 1 }; }
sub new { bless [0] }
sub DESTROY { die 'untidy' unless shift->[0] or $DIED }
sub tidy_up { shift->[0] = 1 }
package main;
my $foo = new Foo;
die if #ARGV; # line 13
$foo->tidy_up();
say 'success';
$ perl test.pl
success
$ perl test.pl die
Died at test.pl line 13.
Since that exception handler is installed globally, make sure you don't override an existing error handler. Signals::XSIG helps with that.