Why is "fork inside BEGIN ... a horrible prospect" in Perl? - perl

As part of a comment on another SO Q&A, a user noted that:
fork inside BEGIN is a horrible prospect
Why is that a "horrible prospect"? (on a technical level - let's leave aside readability or prettiness of design)
use strict;
BEGIN {
# Begin block to fork off a child process.
# This is done in BEGIN, because otherwise My::HeavyModule module
# will be loaded BEFORE the fork and thus inherited by child process
# which is something we want to avoid
my $init = 1; # Some lightweight init code
if (my $pid = fork()) {
# Nothing to do here, proceed to the rest of the main program
} else {
die "cannot fork" unless defined $pid;
print "Child process started!\n";
exit 0;
}
} # End BEGIN block
use My::HeavyModule; # Very heavyweight on both startup time and memory
# Start parent process logic using My::HeavyModule;
Just to be clear: I am NOT asking if there are better ways to achieve what this code is doing. I'm asking why this approach was called "horrible", NOT whether it can be replaced by something that may be better.

I wrote the following comment which you are referring to:
fork inside BEGIN is a horrible prospect. You could also delay compilation for some parts using eval "string" or require, but that also has its issues.
although not because of technical reasons (there are none beyond what dan1111 mentioned, and an explicit portability warning in perlfork), but because it completely broke my expectations about any programs behaviour.
Each piece of code has a compile time. I expect that packages and classes will be set up here, and maybe that some metaprogramming takes place.
Then, there is a run time during which the main control flow of our program occurs, during which the job of this program is being carried out.
Perl complicates this simple distinction in that one block's compile time is another block's run time (e.g. via BEGIN or use or eval or require). But there is always a clear reference point: The phase of $0, the program originally being invoked (see also ${^GLOBAL_PHASE}). There will always be cases where it is a good idea to do funky things during the main script's compile time, but this doesn't necessitate that doing so would be a best practice – on the contrary, and thus my objection that doing so would be a “horrible prospect”.
If the main job of your program is to kick off two other independent programs, it might look like this:
use strict;
use warnings;
my $pid = fork;
if (not defined $pid) {
die "welp, can't fork: $!";
}
if ($pid) {
exec $^X, "heavy_program_you_intended_to_be.pl", #ARGV;
}
else {
... # background yourself, etc.
exec $^X, "light_program_you_intended_to_kick_off.pl";
}
But it would not fork in a BEGIN. I think this is one of many examples where you can do something with Perl, but this doesn't mean you should.

You are running a process of your program before you even know if the whole thing compiles successfully. That doesn't seem to be the type of thing you should want to do.
Readability.
Prettiness of design.
(Sorry, I know you said to ignore those last two. But that isn't really fair, is it? Both are very important considerations in whether code is a "horrible prospect".)
On the other hand, I love that you can do ill-advised things like this in Perl. And I'm not going to tell you you need to re-write your code. That depends too much on the situation: how mission-critical is this code? How maintainable does it need to be? How many other developers will work on it? etc.

Related

perl - two stage conditional compilation

I have pretty big perl script executed quite frequently (from cron).
Most executions require pretty short & simple tests.
How to split single file script into two parts with "part two" compiled based on "part 1" decision?
Considered solution:
using BEGIN{ …; exit if …; } block for trivial test.
two file solution with file_1 using require to compile&execute file_2.
I would prefer single file solution to ease maintenance if the cost is reasonable.
First, you should measure how long the compilation really takes, to see if this "optimization" is even necessary. If it does happen to be, then since you said you'd prefer a one-file solution, one possible solution is using the __DATA__ section for code like so:
use warnings;
use strict;
# measure compliation and execution time
use Time::HiRes qw/ gettimeofday tv_interval /;
my $start;
BEGIN { $start = [gettimeofday] }
INIT { printf "%.06f\n", tv_interval($start) }
END { printf "%.06f\n", tv_interval($start) }
my $condition = 1; # dummy for testing
# conditionally compile and run the code in the DATA section
if ($condition) {
eval do { local $/; <DATA>.'; 1' } or die $#;
}
__DATA__
# ... lots of code here ...
I see two ways of achieving what you want. The simple one would be to divide the script in two parts. The first part will do the simple tests. Then, if you need to do more complicated tests you may "add" the second part. The way to do this is using eval like this:
<first-script.pl>
...
eval `cat second-script.pl`;
if( $# ) {
print STDERR $#, "\n";
die "Errors in the second script.\n";
}
Or using File::Slurp in a more robust way:
eval read_file("second-script.pl", binmode => ':utf8');
Or following #amon suggestion and do:
do "second-script.pl";
Only beware that do is different from eval in this way:
It also differs in that code evaluated with do FILE cannot see lexicals in the enclosing scope; eval STRING does. It's the same, however, in that it does reparse the file every time you call it, so you probably don't want to do this inside a loop.
The eval will execute in the context of the first script, so any variables or initializations will be available to that code.
Related to this, there is this question: Best way to add dynamic code to a perl application, which I asked some time ago (and answered myself with the help of the comments provided and some research.) I took some time to document everything I could think of for anyone (and myself) to refer to.
The second way I see would be to turn your testing script into a daemon and have the crontab bit call this daemon as necessary. The daemon remains alive so any data structures that you may need will remain in memory. On the down side, this will take resources in a continuos way as the daemon process will always be running.

Using filehandles in Perl to alter actively running code

I've been learning about filehandles in Perl, and I was curious to see if there's a way to alter the source code of a program as it's running. For example, I created a script named "dynamic.pl" which contained the following:
use strict;
use warnings;
open(my $append, ">>", "dynamic.pl");
print $append "print \"It works!!\\n\";\n";
This program adds the line
print "It works!!\n";
to the end of it's own source file, and I hoped that once that line was added, it would then execute and output "It works!!"
Well, it does correctly append the line to the source file, but it doesn't execute it then and there.
So I assume therefore that when perl executes a program that it loads it to memory and runs it from there, but my question is, is there a way to access this loaded version of the program so you can have a program that can alter itself as you run it?
The missing piece you need is eval EXPR. This compiles, "evaluates", any string as code.
my $string = q[print "Hello, world!";];
eval $string;
This string can come from any source, including a filehandle.
It also doesn't have to be a single statement. If you want to modify how a program runs, you can replace its subroutines.
use strict;
use warnings;
use v5.10;
sub speak { return "Woof!"; }
say speak();
eval q[sub speak { return "Meow!"; }];
say speak();
You'll get a Subroutine speak redefined warning from that. It can be supressed with no warnings "redefine".
{
# The block is so this "no warnings" only affects
# the eval and not the entire program.
no warnings "redefine";
eval q[sub speak { return "Shazoo!"; }];
}
say speak();
Obviously this is a major security hole. There is many, many, many things to consider here, too long for an answer, and I strongly recommend you not do this and find a better solution to whatever problem you're trying to solve this way.
One way to mitigate the potential for damage is to use the Safe module. This is like eval but limits what built in functions are available. It is by no means a panacea for the security issues.
With a warning about all kinds of issues, you can reload modules.
There are packages for that, for example, Module::Reload. Then you can write code that you intend to change in a module, change the source at runtime, and have it reloaded.
By hand you would delete that from %INC and then require, like
# ... change source code in the module ...
delete $INC{'ModuleWithCodeThatChages.pm'};
require ModuleWithCodeThatChanges;
The only reason I can think of for doing this is experimentation and play. Otherwise, there are all kinds of concerns with doing something like this, and whatever your goal may be there are other ways to accomplish it.
Note The question does specify a filehandle. However, I don't see that to be really related to what I see to be the heart of the question, of modifying code at runtime.
The source file isn't used after it's been compiled.
You could just eval it.
use strict;
use warnings;
my $code = <<'__EOS__'
print "It works!!\n";
__EOS__
open(my $append_fh, ">>", "dynamic.pl")
or die($!);
print($append_fh $code);
eval("$code; 1")
or die($#);
There's almost definitely a better way to achieve your end goal here. BUT, you could recursively make exec() or system() calls -- latter if you need a return value. Be sure to setup some condition or the dominoes will keep falling. Again, you should rethink this, unless it's just practice of some sort, or maybe I don't get it!
Each call should execute the latest state of the file; also be sure to close the file before each call.
i.e.,
exec("dynamic.pl"); or
my retval;
retval = system("perl dynamic.pl");
Don't use eval ever.

Turn off or disallow fatal errors during script run?

We have a while(1) script that loops through its' various workings, then sleeps for 60 seconds and then goes through it all again. This script needs to run 24/7.
As we add new functionality within the while(1) loop, or just over the course of random issues, sometimes they fail and unfortunately crash the entire script. The solution has been wrapping any such functions in eval{}, but my question is...Is there anyway to globally set that all errors or fatals do NOT halt/kill the entire script so we don't have to wrap everything around the eval{} ?
What you are trying to do – ignore any errors, carry on at all costs – is a very questionable practice, may bring your program into undefined state, and makes actual bugs even harder to find.
You could in theory override the CORE::GLOBAL::die subroutine to catch exceptions from your Perl code, but a real die is still available as the CORE::die sub, and this doesn't trap errors from XS code or perl itself (unlike using eval). Note that some modules use die and warn for control flow. Consider the following code:
sub foo {
my ($x, $y) = #_;
croak "X must be smaller than Y" unless $x < $y;
return $y - $x;
}
Now if die becomes a warn, then that function could start to output negative numbers, wreaking all kind of havoc (e.g. when used for array indices).
Please, stay with the eval solution, or even better: migrate to Try::Tiny. Fatal errors exist for a reason.
If high reliability is a must, you may want to adopt an Erlang-like model: A pool of worker processes. If an error turns up, that process is killed, and a replacement process started.
That makes no sense. How would the program know where to resume? You must tell it where, and you do that using eval.
You would surely be better writing a wrapper for the script that logs failures and restarts the script.
Despite the other answers, the fact is that some errors may not be easily recoverable and just ignoring them and trudging along could easily cause unwanted behavior. Another option is to remove the while loop so that the script only executes once, and call it from cron, which allows you to run programs on a schedule. For example, you might open a shell and call crontab -e to edit the scheduler and add this line:
* * * * * perl /path/to/script.pl
which will run your script every minute and send you a mail with the output if there is any, including warnings and errors.

What is the preferred cross-platform IPC Perl module?

I want to create a simple IO object that represents a pipe opened to another program to that I can periodically write to another program's STDIN as my app runs. I want it to be bullet-proof (in that it catches all errors) and cross-platform. The best options I can find are:
open
sub io_read {
local $SIG{__WARN__} = sub { }; # Silence warning.
open my $pipe, '|-', #_ or die "Cannot exec $_[0]: $!\n";
return $pipe;
}
Advantages:
Cross-platform
Simple
Disadvantages
No $SIG{PIPE} to catch errors from the piped program
Are other errors caught?
IO::Pipe
sub io_read {
IO::Pipe->reader(#_);
}
Advantages:
Simple
Returns an IO::Handle object for OO interface
Supported by the Perl core.
Disadvantages
Still No $SIG{PIPE} to catch errors from the piped program
Not supported on Win32 (or, at least, its tests are skipped)
IPC::Run
There is no interface for writing to a file handle in IPC::Run, only appending to a scalar. This seems…weird.
IPC::Run3
No file handle interface here, either. I could use a code reference, which would be called repeatedly to spool to the child, but looking at the source code, it appears that it actually writes to a temporary file, and then opens it and spools its contents to the pipe'd command's STDIN. Wha?
IPC::Cmd
Still no file handle interface.
What am I missing here? It seems as if this should be a solved problem, and I'm kind of stunned that it's not. IO::Pipe comes closest to what I want, but the lack of $SIG{PIPE} error handling and the lack of support for Windows is distressing. Where is the piping module that will JDWIM?
Thanks to guidance from #ikegami, I have found that the best choice for interactively reading from and writing to another process in Perl is IPC::Run. However, it requires that the program you are reading from and writing to have a known output when it is done writing to its STDOUT, such as a prompt. Here's an example that executes bash, has it run ls -l, and then prints that output:
use v5.14;
use IPC::Run qw(start timeout new_appender new_chunker);
my #command = qw(bash);
# Connect to the other program.
my ($in, #out);
my $ipc = start \#command,
'<' => new_appender("echo __END__\n"), \$in,
'>' => new_chunker, sub { push #out, #_ },
timeout(10) or die "Error: $?\n";
# Send it a command and wait until it has received it.
$in .= "ls -l\n";
$ipc->pump while length $in;
# Wait until our end-of-output string appears.
$ipc->pump until #out && #out[-1] =~ /__END__\n/m;
pop #out;
say #out;
Because it is running as an IPC (I assume), bash does not emit a prompt when it is done writing to its STDOUT. So I use the new_appender() function to have it emit something I can match to find the end of the output (by calling echo __END__). I've also used an anonymous subroutine after a call to new_chunker to collect the output into an array, rather than a scalar (just pass a reference to a scalar to '>' if you want that).
So this works, but it sucks for a whole host of reasons, in my opinion:
There is no generally useful way to know that an IPC-controlled program is done printing to its STDOUT. Instead, you have to use a regular expression on its output to search for a string that usually means it's done.
If it doesn't emit one, you have to trick it into emitting one (as I have done here—god forbid if I should have a file named __END__, though). If I was controlling a database client, I might have to send something like SELECT 'IM OUTTA HERE';. Different applications would require different new_appender hacks.
The writing to the magic $in and $out scalars feels weird and action-at-a-distance-y. I dislike it.
One cannot do line-oriented processing on the scalars as one could if they were file handles. They are therefore less efficient.
The ability to use new_chunker to get line-oriented output is nice, if still a bit weird. That regains a bit of the efficiency on reading output from a program, though, assuming it is buffered efficiently by IPC::Run.
I now realize that, although the interface for IPC::Run could potentially be a bit nicer, overall the weaknesses of the IPC model in particular makes it tricky to deal with at all. There is no generally-useful IPC interface, because one has to know too much about the specifics of the particular program being run to get it to work. This is okay, maybe, if you know exactly how it will react to inputs, and can reliably recognize when it is done emitting output, and don't need to worry much about cross-platform compatibility. But that was far from sufficient for my need for a generally useful way to interact with various database command-line clients in a CPAN module that could be distributed to a whole host of operating systems.
In the end, thanks to packaging suggestions in comments on a blog post, I decided to abandon the use of IPC for controlling those clients, and to use the DBI, instead. It provides an excellent API, robust, stable, and mature, and suffers none of the drawbacks of IPC.
My recommendation for those who come after me is this:
If you just need to execute another program and wait for it to finish, or collect its output when it is done running, use IPC::System::Simple. Otherwise, if what you need to do is to interactively interface with something else, use an API whenever possible. And if it's not possible, then use something like IPC::Run and try to make the best of it—and be prepared to give up quite a bit of your time to get it "just right."
I've done something similar to this. Although it depends on the parent program and what you are trying to pipe. My solution was to spawn a child process (leaving $SIG{PIPE} to function) and writing that to the log, or handling the error in what way you see fit. I use POSIX to handle my child process and am able to utilize all the functionality of the parent. However if you're trying to have the child communicate back to the parent - then things get difficult. Do you have an example of the main program and what you're trying to PIPE?

In Perl, is there any way to tie a stash?

Similar to the way AUTOLOAD can be used to define subroutines on demand, I am wondering if there is a way to tie a package's stash so that I can intercept access to variables in that package.
I've tried various permutations of the following idea, but none seem to work:
{package Tie::Stash;
use Tie::Hash;
BEGIN {our #ISA = 'Tie::StdHash'}
sub FETCH {
print "calling fetch\n";
}
}
{package Target}
BEGIN {tie %Target::, 'Tie::Stash'}
say $Target::x;
This dies with Bad symbol for scalar ... on the last line, without ever printing "calling fetch". If the say $Target::x; line is removed, the program runs and exits properly.
My guess is that the failure has to do with stashes being like, but not the same as hashes, so the standard tie mechanism is not working right (or it might just be that stash lookup never invokes tie magic).
Does anyone know if this is possible? Pure Perl would be best, but XS solutions are ok.
You're hitting a compile time internal error ("Bad symbol for scalar"), this happens while Perl is trying to work out what '$Target::x' should be, which you can verify by running a debugging Perl with:
perl -DT foo.pl
...
### 14:LEX_NORMAL/XOPERATOR ";\n"
### Pending identifier '$Target::x'
Bad symbol for scalar at foo.pl line 14.
I think the GV for '::Target' is replaced by something else when you tie() it, so that whatever eventually tries to get to its internal hash cannot. Given that tie() is a little bit of a mess, I suspect what you're trying to do won't work, which is also suggested by this (old) set of exchanges on p5p:
https://groups.google.com/group/perl.perl5.porters/browse_thread/thread/f93da6bde02a91c0/ba43854e3c59a744?hl=en&ie=UTF-8&q=perl+tie+stash#ba43854e3c59a744
A little late to the question, but although it's not possible to use tie to do this, Variable::Magic allows you to attach magic to a stash and thereby achieve something similar.