Why it is not possible to terminate loop from a signal handler in Perl? - perl

Does anybody can explain why the next script does not work? What is the cause for the Label not found for "last SOME_BLOCK" error?
#!/usr/bin/perl
use v5.14;
SOME_BLOCK: {
alarm 1;
$SIG{ALRM} = sub {
last SOME_BLOCK;
};
my $count = 0;
while (1) {
$count += 1;
say $count;
}
};

Exiting a subroutine via last or next is forbidden according to perldoc (and generally triggers a warning). This is because it's quite messy - Perl would need to search dynamically up scopes to find the block that you're trying to skip, and call return from various functions (but what return value should be used?). return is generally safer.
In the signal handling context, it's extra messy because Perl actually has to pause execution of your script in order to execute the signal handler. So it's now running two separate execution contexts, and the signal handler context cannot affect the control flow of the main context directly, which is why you get that error.
There are two things you can do:
throw an exception (using die) and catch it in the outer block. This is undesirable, as it could interrupt pretty much anything.
set a global flag defined outside the signal handler e.g. ($caught_signal = 1) and check for that in the inner code at a convenient point.

Related

Is the use of an uninitialized variable undefined behavior?

I don't know if "undefined behavior" means something in Perl but I would like to know if using not initialized variables in Perl may provoke unwanted behaviors.
Let's consider the following script:
use strict;
use warnings FATAL => 'all';
use P4;
my $P4;
sub get {
return $P4 if $P4;
# ...connection to Perforce server and initialization of $P4 with a P4 object...
return $P4;
}
sub disconnect {
$P4 = $P4->Disconnect() if $P4;
}
sub getFixes {
my $change = shift;
my $p4 = get();
return $p4->Run( "fixes", "-c", $change );
}
Here, the variable $P4, which is meant to store a P4 object after a connection to a Perforce server, is not initialized at the beginning of the script. However, whatever the function which is called first (get, disconnect or getFixes), the variable will be initialized before being used.
Is there any risk to do that? Should I explicitly initialized the $P4 variable at the beginning of the script?
Just a couple of straight-up answers to basic questions asked.
if "undefined behavior" means something in Perl
Yes, there is such a notion in Perl, and documentation warns of it (way less frequently than in C). See some examples in footnote †. On the other hand, at many places in documentation one finds a discussion ending with
... So don't do that.
It often comes up for things that would confuse the interpreter and could result in strange and possibly unpredictable behavior. These are sometimes typical "undefined behavior" even as they are not directly called as such.
The main question is of how uninitialized variables relate, per the title and
if using not initialized variables in Perl may provoke unwanted behaviors
This does not generally result in "undefined behavior" but it may of course lead to trouble and one mostly gets a warning for it. Unless the variable is legitimately getting initialized in such "use" of course. For example,
my $x;
my $z = $x + 3;
will draw a warning for the use of $x but not for $z (if warnings are on!). Note that this still succeeds as $x gets initialized to 0. (But in what is shown in the question the code will abort at that point, due to the FATAL.)
The code shown in the question seems fine in this sense, since as you say
the variable will be initialized before being used
Testing for truth against an uninitialized variable is fine since once it is declared it is equipped with the value undef, admissible (and false) in such tests.
See the first few paragraphs in Declarations in perlsyn for a summary of sorts on when one does or doesn't need a variable to be defined.
† A list of some behaviors specifically labeled as "undefined" in docs
Calling sort in scalar context
In list context, this sorts the LIST and returns the sorted list value. In scalar context, the behaviour of sort is undefined.
Length too great in truncate
The behavior is undefined if LENGTH is greater than the length of the file.
Using flags for sysopen which are incompatible (nonsensical)
The behavior of O_TRUNC with O_RDONLY is undefined.
Sending signals to a process-list with kill, where one can use negative signal or process number to send to a process group
If both the SIGNAL and the PROCESS are negative, the results are undefined. A warning may be produced in a future version.
From Auto-increment and Auto-decrement (perlop)
... modifying a variable twice in the same statement will lead to undefined behavior.
Iterating with each, tricky as it may be anyway, isn't well behaved if hash is inserted into
If you add or delete a hash's elements while iterating over it, the effect on the iterator is unspecified; for example, entries may be skipped or duplicated--so don't do that. It is always safe to delete the item most recently returned by each, ...
This draws a runtime warning (F), described in perldiag
Use of each() on hash after insertion without resetting hash iterator results in undefined behavior.
Statement modifier (perlsyn) used on my
The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ...) is undefined.
Some of these seem a little underwhelming (predictable), given what UB can mean. Thanks to ikegami for comments. A part of this list is found in this question.
Pried from docs current at the time of this posting (v5.32.1)
A variable declared with my is initialized with undef. There is no undefined behaviour here.
This is documented in perldoc persub:
If no initializer is given for a particular variable, it is created with the undefined value.
However, the curious construct my $x if $condition does have undefined behaviour. Never do that.
my initializes scalars to undef, and arrays and hashes to empty.
Your code is fine, though I would take a different approach to destruction.
Option 1: Provide destructor through wrapping
use Object::Destroyer qw( );
use P4 qw( );
my $P4;
sub get {
return $P4 ||= do {
my $p4 = P4->new();
$p4->SetClient(...);
$p4->SetPort(...);
$p4->SetPassword(...);
$p4->Connect()
or die("Failed to connect to Perforce Server" );
Object::Destroyer->new($p4, 'Disconnect')
};
}
# No disconnect sub
Option 2: Provide destructor through monkey-patching
use P4 qw( );
BEGIN {
my $old_DESTROY = P4->can('DESTROY');
my $new_DESTROY = sub {
my $self = shift;
$self->Disconnect();
$old_DESTROY->($self) if $old_DESTROY;
};
no warnings qw( redefined );
*P4::DESTROY = $new_DESTROY;
}
my $P4;
sub get {
return $P4 ||= do {
my $p4 = P4->new();
$p4->SetClient(...);
$p4->SetPort(...);
$p4->SetPassword(...);
$p4->Connect()
or die("Failed to connect to Perforce Server" );
$p4
};
}
# No disconnect sub

How to detect if Perl code is being run inside an eval?

I've got a module that uses an #INC hook and tries to install missing modules as they are used. I don't want this behaviour to fire inside an eval. My attempt to do this currently is:
return
if ( ( $caller[3] && $caller[3] =~ m{eval} )
|| ( $caller[1] && $caller[1] =~ m{eval} ) );
That's the result of me messing around with the call stack in some experiments, but it's not catching everything, like this code in HTTP::Tinyish:
sub configure_backend {
my($self, $backend) = #_;
unless (exists $configured{$backend}) {
$configured{$backend} =
eval { require_module($backend); $backend->configure };
}
$configured{$backend};
}
sub require_module {
local $_ = shift;
s!::!/!g;
require "$_.pm";
}
Maybe I just need to traverse every level of the call stack until I hit an eval or run out of levels. Is there a better or easier way for me to figure out whether or not code is being wrapped in an eval without traversing the call stack?
Post mortem on this question:
as was suggested by multiple posters, this was basically a bad idea
$^S is technically a correct way to do this, but it doesn't let you know if you're inside an eval that was called somewhere higher in the stack
using a regex + Carp::longmess() seems to be the most concise way to figure this out
knowing if code is running inside an eval may be somewhat helpful for informational purposes, but since this could be happening for many different reasons, it's very hard to infer why it's happening
regardless, this was an interesting exercise. i thank all contributors for their helpful input
Carp::longmess traverses the stack for you in one call, if that makes things easier
return if Carp::longmess =~ m{^\s+eval }m
If $^S is true, the code is inside an eval.
sub foo { print $^S }
eval { foo() }; # 1
foo(); # 0
Don't try to do this in reusable code. There are many reasons to be in an eval and not want this kind of action at a distance change.

ONE::Timer doesn't call the subroutine

I am having a problem with a project I develop at work. It is a multi-threaded application. I use Moose for the Object Oriented Programming and ONE::Timer for the timer. I think I am not using it well as my subroutine is never called. Basically, here is my code:
sub add {
my ($self, $bot) = #_;
my $provision_object = $bot->provision->new;
$provision_object->compute;
}
and my compute method looks like the following:
sub compute {
<A couple of line code...>
ONE::Timer->at($time1, sub { <computations1> });
ONE::Timer->at($time2, sub { <computations2> });
}
time1 and time2 contains a timestamp. As my script is a server (so it never finishes), I am sure that these times are reached before the end of the script.
The two subroutine called after time1 and time2 seconds are never called. In the documentation of the ONE::Timer->at method, it says:
If you store the return value, it acts as a guard-- if it's destoryed then the timer is canceled.
That's why I do not store the return value of my call to the method. Am I missing something here?
Thanks a lot for any answers ;)
So, ONE is an event loop*, so you have to enter the loop at some point. Normally you'd enter you event loop by calling ONE->loop().
The call to ONE->loop() won't return until one of your event listeners calls ONE->stop().
One option would be to wrap your code in a collect block, eg:
use ONE qw( Collect );
collect {
# The main body of your program
};
Collect will only return after all of your events have fired once.
If you want green-threads, that is, if you want your event listeners to effectively run asynchronously without further intervention, write your program thusly:
use EV; # Optional, but everything is better with it
use Coro;
use ONE;
async {
# Put the body of your program here
# In order for Coro to trigger and execute your
# event handlers, you'd need to be blocking on
# IO somewhere in here.
ONE->stop(); # Call when you want your program to exit
};
ONE->loop();
Coro has its own caveats however, and you should read up on it before jumping in.
With the version on Github, this would be the equivalent
use EV;
use Coro;
use ONE;
ONE->loop(sub {
# Your main body
ONE->stop;
});
* ONE is a wrapper around AnyEvent, it does not implement an event loop itself, rather it, provides a platform for adding a mostly thin abstraction around AnyEvent.

Is this non-forking Perl timeout method safe?

I have a function which calls several other functions, e.g.
sub do_work {
send_mail();
send_soap_envelope();
send_rpc();
}
The called functions may possibly hang, so I want to stop them after some timeout. I want to avoid forking because it is expensive in my context (e.g. database handles need to be recreated after each fork). I've come up with the following scheme:
sub timeout {
my ($code) = #_;
eval {
alarm 2;
local $SIG{ALRM} = sub { die 'timeout' };
&$code;
alarm 0;
};
# handling of $# eq 'timeout' removed for brevity
}
sub do_work {
timeout \&send_mail;
timeout \&send_soap_envelope;
timeout \&send_rpc;
};
The timeout() function (in this example hardcoded to a timeout of 2 seconds) uses an eval block as a means of aborting the execution of the payload function using die.
This works fine in my test scenarios, but I'm feeling uneasy about what will happen if the die interrupts the payload function while the Perl interpreter is not in a "safe state", e.g. while it is processing an XS subroutine. Is my gut feeling right?
Since 5.8.1, Perl uses "safe signal handling". It doesn't give your signal handler to the system, it gives a safe signal handler instead. This safe signal handler simply notes that a signal was received and returns. Between executing Perl opcodes, the interpreter checks if any signals were received and calls your signal handler if there was.
That means that signals will not get processed in the middle of a long op, such as a long XS call or a long regex match. Signals interrupt most system calls, so your signal handler will be called shortly after the signal comes in ever if you are in the middle of a blocking system call (e.g. sleep, read, etc)
alarm(2);
my $s = time;
$SIG{ALRM} = sub {
my $e = time;
print $e-$s, "\n"; # 6, not 2.
};
('a' x 25) =~ /a*a*a*a*a*a*a*a*a*(?:b|c)/;
* — In order to speed up programs, it's checked a little less often now, but you shouldn't notice the difference.
It's not quite safe because it calls alarm() before it installs the SIGALRM handler. Swap the local $SIG{ALRM} and alarm lines, and it should be much improved.
Okay, right now I see that perldoc -f alarm mentions my exact use-case:
If you want to use "alarm" to time out a system call you need to use an "eval"/"die" pair.
(Followed by example code there.)

How can an SVN::Error callback identify the context from which it was called?

I've written some fairly extensive Perl modules and scripts using the Perl bindings SVN::Client etc. Since the calls to SVN::Client are all deep in a module, I have overridden the default error handling.
So far I have done so by setting
$SVN::Error::handler = undef;
as described in the docs, but this makes the individual calls a bit messy because you have to remember to make each call to SVN::Client in list context and test the first value for errors.
I would like to switch to using an error handler I would write; but $SVN::Error::handler is global, so I can't see any way that my callback can determine where the error came from, and what object to set an error code in.
I wondered if I could use a pool for this purpose: so far I have ignored pools as irrelevant to working in Perl, but if I call a SVN::Client method with a pool I have created, will any SVN::Error object be created in the same pool?
Has anybody any knowledge or experience which bears on this?
OK, I'm going to assume the issue is that (a) you want to set a flag in some object when an error occurs, and then check the flag later at the end of all operations, and (b) that your error handler (in a global variable) needs some way to know which object to touch. You can achieve this using a closure, something like the following:
#
# This part is the library that implements error handling a bit like
# SVN::Client
#
sub default_error_handler {
croak "An error occurred: $_[0]";
}
our $global_error_handler = \&default_error_handler;
sub library_function_that_might_fail {
&$global_error_handler("Guess what - it failed!");
}
#
# This part is the function that wants to detect an error
#
sub do_lots_of_stuff {
my $error = undef; # No errors so far!
local($global_error_handler) = sub { $error = $_[0]; };
library_function_that_might_fail();
library_function_that_might_fail();
library_function_that_might_fail();
if ($error) {
print "There was an error: $error\n";
}
}
#
# Main program
#
do_lots_of_stuff();
The key is that when, in do_lots_of_stuff(), we set the error handler to an anonymous sub, that sub continues to have access to the local variables of the function that created it - so it can modify $error to signal that an error occurred.