BEGIN,CHECK,INIT& END blocks in Perl - perl

As I understand these special functions inside Perl code, BEGIN and CHECK blocks run during the compilation phase while INIT and END blocks run during actual execution phase.
I can understand using these blocks inside actual Perl code (Perl libraries) but what about using them inside modules? Is that possible?
Since when we use use <Module-name> the module is compiled, so in effect BEGIN and CHECK blocks run. But how will the INIT and END blocks run since module code I don't think is run in the true sense. We only use certain functions from inside the modules.

Short The special code blocks in packages loaded via use are processed and run (or scheduled to run) as encoutered, in the same way and order as in main::, since use itself is a BEGIN block.
Excellent documentation on this can be found in perlmod. From this section
A BEGIN code block is executed as soon as possible, that is, the moment it is completely defined, even before the rest of the containing file (or string) is parsed.
Since the use statements are BEGIN blocks they run as soon as encountered. From use
It is exactly equivalent to
BEGIN { require Module; Module->import( LIST ); }
So the BEGIN blocks in a package run in-line with others, as they are encountered. The END blocks in a package are then also compiled in the same order, as well as the other special blocks. As for the order of (eventual) execution
An END code block is executed as late as possible ...
and
You may have multiple END blocks within a file--they will execute in reverse order of definition; that is: last in, first out (LIFO)
The order of compilation and execution of INIT and CHECK blocks follows suit.
Here is some code to demonstrate these special code blocks used in a package.
File PackageBlocks.pm
package PackageBlocks;
use warnings;
BEGIN { print "BEGIN block in the package\n" }
INIT { print "INIT block in the package\n" }
END { print "END block in the package\n" }
1;
The main script
use warnings;
BEGIN { print "BEGIN in main script.\n" }
print "Running in the main.\n";
INIT { print "INIT in main script.\n" }
use PackageBlocks;
END { print "END in main script.\n" }
BEGIN { print "BEGIN in main script, after package is loaded.\n" }
print "After use PackageBlocks.\n";
Output
BEGIN in main script.
BEGIN block in the package
BEGIN in main script, after package is loaded.
INIT in main script.
INIT block in the package
Running in the main.
After use PackageBlocks.
END in main script.
END block in the package
The BEGIN block in the package runs in order of appearance, in comparison with the ones in main::, and before INIT. The END block runs at end,
and the one in the package runs after the one in main::, since the use comes before it in this example.

This is very easy to test for yourself
use Module (and require EXPR and do EXPR and eval EXPR) compile the Perl code and then immediately run it
That is where the 1; at the end of most modules is picked up. If executing the module's code after compiling it doesn't return a true value then require will fail
Admittedly there usually isn't much use for an INIT or an END block, because the run-time phase is so intimately tied to the compilation, and because modules are generally about defining subroutines, but the option is there if you want it

Related

How to define an environment variable before loading modules?

I use the AnyEvent::DNS module.
I want to disable IPv6, so that the resolver only makes a request for A record.
AnyEvent::DNS, uses the environment variable $ENV{PERL_ANYEVENT_PROTOCOLS}
But setting the variable does not work; the resolver still sends two requests A and AAAA
Code from AnyEvent::DNS:
our %PROTOCOL; # (ipv4|ipv6) => (1|2), higher numbers are preferred
BEGIN {
...;
my $idx;
$PROTOCOL{$_} = ++$idx
for reverse split /\s*,\s*/,
$ENV{PERL_ANYEVENT_PROTOCOLS} || "ipv4,ipv6";
}
How to define an environment variable before loading modules?
Since the code that checks the environment variable is in a BEGIN block, it will be run immediately once the Perl compiler reaches it.
When Perl starts compiling your script, it checks for use statements first. So when you use AnyEvent::DNS, Perl loads that module and parses the file. BEGIN blocks are executed at that stage, while code in methods will only be compiled, not executed.
So if you have something like the following, the code you showed above will be run before you even set that variable.
use strict;
use warnings;
use AnyEvent::DNS;
$ENV{PERL_ANYEVENT_PROTOCOLS} = 'ipv4';
...
There are two ways you can circumvent that.
You can put the assignment in your own BEGIN block before you load AnyEvent::DNS. That way it will be set first.
use strict;
use warnings;
BEGIN {
$ENV{PERL_ANYEVENT_PROTOCOLS} = 'ipv4';
}
use AnyEvent::DNS;
Alternatively, you can just call your program with the environment variable set for it from the shell.
$ PERL_ANYEVENT_PROTOCOLS=ipv4 perl resolver.pl
The second one is more portable, in case you later want it to do IPv6 after all.
Read more about BEGIN in perlmod.

identify a procedure and replace it with a different procedure

What I want to achieve:
###############CODE########
old_procedure(arg1, arg2);
#############CODE_END######
I have a huge code which has a old procedure in it. I want that the call to that old_procedure go to a call to a new procedure (new_procedure(arg1, arg2)) with the same arguments.
Now I know, the question seems pretty stupid but the trick is I am not allowed to change the code or the bad_function. So the only thing I can do it create a procedure externally which reads the code flow or something and then whenever it finds the bad_function, it replaces it with the new_function. They have a void type, so don't have to worry about the return values.
I am usng perl. If someone knows how to atleast start in this direction...please comment or answer. It would be nice if the new code can be done in perl or C, but other known languages are good too. C++, java.
EDIT: The code is written in shell script and perl. I cannot edit the code and I don't have location of the old_function, I mean I can find it...but its really tough. So I can use the package thing pointed out but if there is a way around it...so that I could parse the thread with that function and replace function calls. Please don't remove tags as I need suggestions from java, C++ experts also.
EDIT: #mirod
So I tried it out and your answer made a new subroutine and now there is no way of accessing the old one. I had created an variable which checks the value to decide which way to go( old_sub or new_sub)...is there a way to add the variable in the new code...which sends the control back to old_function if it is not set...
like:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
# check for the variable and send to old_sub if the var is not set
sub bad_sub
{ # good code
}
}
# Thanks #mirod
This is easier to do in Perl than in a lot of other languages, but that doesn't mean it's easy, and I don't know if it's what you want to hear. Here's a proof-of-concept:
Let's take some broken code:
# file name: Some/Package.pm
package Some::Package;
use base 'Exporter';
our #EXPORT = qw(forty_two nineteen);
sub forty_two { 19 }
sub nineteen { 19 }
1;
# file name: main.pl
use Some::Package;
print "forty-two plus nineteen is ", forty_two() + nineteen();
Running the program perl main.pl produces the output:
forty-two plus nineteen is 38
It is given that the files Some/Package.pm and main.pl are broken and immutable. How can we fix their behavior?
One way we can insert arbitrary code to a perl command is with the -M command-line switch. Let's make a repair module:
# file: MyRepairs.pm
CHECK {
no warnings 'redefine';
*forty_two = *Some::Package::forty_two = sub { 42 };
};
1;
Now running the program perl -MMyRepairs main.pl produces:
forty-two plus nineteen is 61
Our repair module uses a CHECK block to execute code in between the compile-time and run-time phase. We want our code to be the last code run at compile-time so it will overwrite some functions that have already been loaded. The -M command-line switch will run our code first, so the CHECK block delays execution of our repairs until all the other compile time code is run. See perlmod for more details.
This solution is fragile. It can't do much about modules loaded at run-time (with require ... or eval "use ..." (these are common) or subroutines defined in other CHECK blocks (these are rare).
If we assume the shell script that runs main.pl is also immutable (i.e., we're not allowed to change perl main.pl to perl -MMyRepairs main.pl), then we move up one level and pass the -MMyRepairs in the PERL5OPT environment variable:
PERL5OPT="-I/path/to/MyRepairs -MMyRepairs" bash the_immutable_script_that_calls_main_pl.sh
These are called automated refactoring tools and are common for other languages. For Perl though you may well be in a really bad way because parsing Perl to find all the references is going to be virtually impossible.
Where is the old procedure defined?
If it is defined in a package, you can switch to the package, after it has been used, and redefine the sub:
use BadPackage; # sub is defined there
BEGIN
{ package BapPackage;
no warnings; # to avoid the "Subroutine bad_sub redefined" message
sub bad_sub
{ # good code
}
}
If the code is in the same package but in a different file (loaded through a require), you can do the same thing without having to switch package.
if all the code is in the same file, then change it.
sed -i 's/old_procedure/new_procedure/g codefile
Is this what you mean?

Force Perl to call END subroutines when ending with exec()?

When you use exec() in Perl:
Note that exec will not call your END blocks, nor will it invoke DESTROY methods on your objects.
How do I force perl to call END blocks anyway? Can I do something like END(); exec($0) or whatever?
I really am trying to make the program end its current instance and start a brand new instance of itself, and am too lazy to do this correctly (using cron or putting the entire program in an infinite loop). However, my END subroutines cleanup temp files and other important things, so I need them to run between executions.
Unhelpful links to code:
https://github.com/barrycarter/bcapps/blob/master/bc-metar-db.pl
https://github.com/barrycarter/bcapps/blob/master/bc-voronoi-temperature.pl
https://github.com/barrycarter/bcapps/blob/master/bc-delaunay-temperature.pl
So you're trying to execute a program within your script? exec probably isn't what you want then. exec behaves like the C exec: what gets called replaces your current process; to keep going, you'd have to do something like a fork to preserve your current process while executing another.
But good news! That all exists in the system builtin.
Does exactly the same thing as exec LIST , except that a fork is done first and the parent process waits for the child process to exit.
Here's what it looks like:
use 5.012; # or use 5.012 or newer
use warnings;
... # some part of my program
system($my_command, $arg1, $arg2); # forks, execs, returns.
END {
# still gets called because you never left the script.
}
If you absolutely must use an exec, you must call your cleanup routine automatically. To understand more about END, see perldoc perlmod for full details. The short of it: END is one of several types of blocks of code that gets execucted at a particular stage in the execution of the script. They are NOT subroutines. However, you can execute any code you want in those subroutines. So you can do:
sub cleanup { ... } # your cleanup code
sub do_exec {
cleanup();
exec( ... );
}
END {
cleanup();
}
and then you know your cleanup code will be executed at either script exit OR when you do your exec.
To answer the narrow question of how to invoke your END blocks at arbitrary times, you can use the B::end_av method with B::SV::object_2svref to get the code references to your END blocks.
sub invoke_end_blocks_before_exec {
use B;
my #ENDS = B::end_av->ARRAY;
foreach my $END (#ENDS) {
$END->object_2svref->();
}
}
END { print "END BLOCK 1\n" }
END { print "END BLOCK 2\n" }
...
invoke_end_blocks_before_exec();
exec("echo leave this program and never come back");
Output:
END BLOCK 2
END BLOCK 1
leave this program and never come back
I would usually prefer something less magical, though. Why not a structure like
sub cleanup { ... }
END { &cleanup }
if (need_to_exec()) {
cleanup(); # same thing END was going to do anyway
exec( ... );
}
?
Fork and exec
It'll leave you with a new pid, but you could do a fork/exec:
my $pid = fork();
defined $pid or die "fork failed";
exit if $pid; # parent immediately exits, calling END blocks.
exec($0) or die "exec failed"; # child immediately execs, will not call END blocks (but parent did, so OK)
This strikes me as far less fragile than mucking with internals or trying to make sure your exec is in the final END block.
Wrap your program
Also, it is trivial to just wrap your Perl program in a shell (or Perl) script that looks something like this:
#!/bin/sh
while sleep 5m; do
perl your-program.pl
done
or
#!/usr/bin/perl
while (1) {
system("perl your-program.pl");
sleep(5*60);
}
Can you put your call to exec in at the end of the (final) END block? Where your current call to exec is, set a flag, then exit. At the end of the END block, check the flag, and if it's true, call exec there. This way, you can exit your script without restarting, if necessary, and still have the END blocks execute.
That said, I'd recommend not implementing this type of process-level tail recursion.

perlmod question

In the example in perlmod/Perl Modules there is a BEGIN block. I looked at some modules but none of these had a BEGIN block. Should I use such a BEGIN block when writing a module or is it dispensable?
You only need a BEGIN block if you need to execute some code at compile time versus run-time.
An example: Suppose you have a module Foo.pm in a non-standard library directory (like /tmp). You know you can have perl find the module by modifying #INC to include /tmp. However, this will not work:
unshift(#INC, '/tmp');
use Foo; # perl reports Foo.pm not found
The problem is that the use statement is executed at compile time whereas the unshift statement is executed at run time, so when perl looks for Foo.pm, the include path hasn't been modified (yet).
The right way to accomplish this is:
BEGIN { unshift(#INC, '/tmp') };
use Foo;
Now the unshift statement is executed at compile-time and before the use Foo statement.
The vast majority of scripts will not require BEGIN blocks. A lot of what you need in BEGIN blocks can be obtained through use-ing other modules. For instance, in this case we could make sure /tmp is in #INC by using the lib.pm module:
use lib '/tmp';
use Foo;
A BEGIN block in a module is entirely dispensable. You only use it if there is something that must be done by your module when it is loaded, before it is used. There are seldom reasons to do much at that point, so there are seldom reasons to use a BEGIN block.

Why does Perl run END and CHECK blocks in LIFO order?

I have no deep or interesting question--I'm just curious why it is so.
Each package is assumed to rely on the correct function of EVERYTHING that went before it. END blocks are intended to "clean up and close out" anything the package might need to take care of before the program finishes. But this work might rely on the correct functioning of the packages started earlier, which might no longer be true if they are allowed to run their END blocks.
If you did it any other way, there could be bad bugs.
Here is a simple example which may help:
# perl
BEGIN { print "(" }
END { print ")" }
BEGIN { print "[" }
END { print "]" }
This outputs: ([])
If END had been a FIFO then BEGIN/END wouldn't work well together.
Update - excerpt from Programming Perl 3rd edition, Chapter 18: Compiling - Avant-Garde Compiler, Retro Interpreter, page 483:
If you have several END blocks within a file, they execute in reverse order of their definition. That is, the last END block defined is the first one executed when your program finishes. This reversal enables related BEGIN and END blocks to nest the way you'd expect, if you pair them up
/I3az/
Perl borrows heavily from C, and END follows the lead of C's atexit:
NAME
atexit - register a function to run at process termination
SYNOPSIS
#include <stdlib.h>
int atexit(void (*func)(void));
DESCRIPTION
The atexit() function shall register the function pointed to by func, to be called without arguments at normal program termination. At normal program termination, all functions registered by the atexit() function shall be called, in the reverse order of their registration …