how to implement import semantics into the current block scope? - perl

The documentation of use indicates that:
Some ... pseudo-modules import semantics into the current block scope (like strict or integer , unlike ordinary modules, which import symbols into the current package (which are effective through the end of the file).
Similarly, autodie
Replace functions with ones that succeed or die with lexical scope
How to implement import semantics into the current block scope with ordinary modules?

strict and warnings are implemented using some special flag variables that don't contain room for user pragmas. Starting with perl 5.10, you can write your own lexically scoped pragmas. perlpragma contains information on how to do so. You can also browse the source of existing pragmatic modules.

Related

Moving from CGI to mod_perl. Understanding my, our, local

I've been using apache mod_cgi during some years. Now I am moving to mod_perl and I have found some problems, specially with subroutines. Until now I was never using my, our nor local; and the CGI scripts worked without problems. After reading documentation and even some previous questions posted here I understand more or less how my, our and local works. My concern is what information is going to be shared between the next requests (if I understand correctly, that's the main concern I must have while running mod_perl instead of mod_cgi).
Is there any difference between using our in a scalar or just the scalar without declaring anything special such as my? Aren't both global?
If I do not declare the scalar as private is going to be shared in the next request? Even in another request of a different perl script in the same server?
How can I share the value of a scalar inside a subroutine to outside that subroutine but not outside the same file nor the same request?
If I use a my in a scalar inside an if in the same level of the file or in the same subroutine, and after that I create another if where I use the same scalar; is that scalar shared between both if or each if means different blocks? What about while and for, are they different blocks for the previously declared as my scalar or that only works for subroutines and files?
mod_perl works by wrapping each Perl script in a subroutine called handler within a package based on the name and path of the script. Instead of starting a new process to run each script, this handler subroutine is called by one of a number of persistent Perl theads.
Ordinarily this knowledge would help a lot to understand the changes in environment from mod_cgi, but since you have never added use strict to your programs and become familiar with the workings of declared variables you have a lot of catching up to do!
The mod_perl environment has the potential for causing non-obvious security breaches, and you should start now to use strict on every script and declare every variable. use Carp will also help you to understand the error logs.
A variable name declared with our is a lexically-scoped synonym for a package variable of the same name that can be used without fully qualifying the name by including the package name. For instance, ordinarily a variable declared with our $var will provide access to the $main::var scalar (if there has been no preceding package declaration) without specifying main::. However, such variables that began life with a value of undef in mod_cgi will now retain their values from the previous execution of any given mod_perl thread, and for consistency it is safest to always initialise them at the point of declaration. Note also that the default package name is no longer main because of the wrapping that mod_perl does, so you can no longer access package variables using the main:: prefix, and it is unwise to find the actual name of the package and explicitly use that because it will be a very long name and will change if you move or rename your script.
A my variable is one that exists independently of the package symbol table, and normally its lifetime is the run time of the enclosing file (for variables declared at file scope) or subroutine. They are safe in mod_perl if both declared and used at file scope of the script or entirely within one subroutine, but you can be stung if you mix scopes and declare a my $global at file scope and then try to use it in a subroutine. The reason for this isn't simple, but it is caused by mod_perl wrapping your script in a handler subroutine so you have nested subroutine declarations. The inner subroutine will tend to adopt only the first instantiation of $global and ignore any others created by later calls to handler. If you need a global variable you should declare it with our and initialise it in that declaration as described above.
A local variable is very like an our variable in that it forms a synonym to a package variable. However it temporarily saves the current value of that variable and provides a new copy for use until the end of the file or block scope. Because of its automatic creation and deletion within its scope it can be a useful alternative to a my variable in mod_perl scripts, particularly where you are using pointers to data structures like, say, an instance of the CGI class. Declaring our $cgi = CGI->new would correctly create the object but, because of mod_perl's persistence, would leave it in memory until the next execution of the thread deletes it to make room for another one.
As for your questions:
Using a variable without declaring it either causes a compile-time error if use strict is in place as it should be. Otherwise it is a synonym for that variable in the current package namespace.
Variables are either package variables or lexical variables; there is no way to declare a variable as private as such. Lexical variables (declared with my) will be created and destroyed with each execution of the script, unless you have created an invalid closure as described above by writing a subroutine that uses a variable declared at a wider scope, when the variable will be persistent but won't do what you want it to. A variable declared with our will retain its value across calls to the script, while one declared with local will be destroyed when the script terminates. Both our and local variables are package variables and all references to the same variable name refer to the same variable.
To declare a variable that is consistently accessible everywhere within any one call of a script you can either use a local variable or an initialised our variable. At file scope local $global is largely equivalent to our $global = undef for mod_perl scripts. If you use an our variable to point to a data structure then remember to destroy it at the end of the script with undef $global.
my variables are unique to, and visible within, the block in which they are declared, whether that is a block within an if, while or for, or even just a bare { ... } block scope. Always use my variables for temporary work variables that are used only within a block and accessed from nowhere else.
I hope this helps
Edit: this is general information on Perl variable scoping only. Please see Borodin's post for specific mod_perl issues.
Variables declared with my are lexical. In other words, they exist only within the current scope. You should declare all of your variables with my by default; only do something else when you specifically want different functionality.
Using lexically-scoped variables is a basic part of good code design in (almost) any language. Putting use strict; and use warnings; in all of your scripts will require you to follow this good practice.
our is a way of declaring a global variable; the underlying result is very similar to using undeclared globals. However, it has two differences:
You are explicitly stating that you want the variable to be global. This is a good practice to follow, since use of global variables should be an exceptional case. Because of this, you can create a global in this way even if you use strict;.
The variable declared with our will be accessible by the name you declare throughout all packages in the current scope. An undeclared variable, by contrast, is only accessible by simple name within the current package. Outside of that, you could only refer to it as $package::variable.
See the documentation for our for more details.
local does not create a lexical variable; instead, it is a way to give a global variable a temporary value within the current scope. It is mostly used with Perl's special built-in (punctuation) variables:
{
local $/; #make the record separator undefined in this scope only.
my $file = <FILE>; #read in an entire file at once.
}
You can go far simply by using my at all times for your variables and using local only for special cases like that shown above.

How to import all "our"-variables from the unnamed Perl module without listing them?

I need to import all our variables from the unnamed Perl module (Module.pm) and use them inside the Perl script (Script.pl).
The following code works well without the "use strict", but failed with it. How can I change this code to work with "use strict" without the manual listing of all imported variables (as described in the answer to other question)?
Thanks a lot for your help!
Script.pl:
use strict;
require Module;
print $Var1;
Module.pm:
our $Var1 = "1\n";
...
our $VarN = "N\n";
return 1;
Run the script:
$> perl Script.pl
Errors:
Global symbol "$Var1" requires explicit package name at Script.pl line 3.
Execution of Script.pl aborted due to compilation errors.
NOTE (1): The module is unnamed, so using a Module:: prefix is not the option.
NOTE (2): Module.pm contains also a set of functions configured by global variables.
NOTE (3): Variables are different and should NOT be stored in one array.
NOTE (4): Design is NOT good, but the question is not about the design. It's about forcing of the listed code to work with minimal modifications with the complexity O(1), i.e. a few lines of code that don't depend on the N.
Solution Candidate (ACCEPTED): Add $:: before all imported variables. It's compliant with strict and also allows to differ my variables from imported in the code.
Change your script to:
use strict;
require Module;
print $Module::Var1;
The problem is the $Var1 isn't in the main namespace, it's in Module's namespace.
Edit: As is pointed out in comments below, you haven't named your module (i.e. it doesn't say package Module; at the top). Because of this, there is no Module namespace. Changing your script to:
use strict;
require Module;
print $main::Var1;
...allows the script to correctly print out 1\n.
If you have to import all the our variables in every module, there's something seriously wrong with your design. I suggest that you redesign your program to separate the elements so there is a minimum of cross-talk between them. This is called decoupling.
You want to export all variables from a module, and you want to do it in such a way that you don't even know what you're exporting? Forget about use strict and use warnings because if you put them in your program, they'll just run screaming out, and curl up in a corner weeping hysterically.
I never, and I don't mean hardly ever, never export variables. I always create a method to pull out the required value. It gives me vital control over what I'm exposing to the outside world and it keeps the user's namespace pure.
Let's look at the possible problems with your idea.
You have no idea what is being exported in your module. How is the program that uses that module going to know what to use? Somewhere, you have to document that the variable $foo and #bar are available for use. If you have to do that, why not simply play it safe?
You have the issue of someone changing the module, and suddenly a new variable is being exported into the program using that module. Imagine if that variable was already in use. The program suddenly has a bug, and you'll never be able to figure it out.
You are exporting a variable in your module, and the developer decides to modify that variable, or even removes it from the program. Again, because you have no idea what is being imported or exported, there's no way of knowing why a bug suddenly appeared in the program.
As I mentioned, you have to know somewhere what is being used in your module that the program can use, so you have to document it anyway. If you're going to insist on importing variables, at least use the EXPORT_OK array and the Exporter module. That will help limit the damage. This way, your program can declare what variables its depending upon and your module can declare what variables it knows programs might be using. If I am modifying the module, I would be extra careful of any variable I see I am exporting. And, if you must specify in your program what variables you're importing, you know to be cautious about those particular variables.
Otherwise, why bother with modules? Why not simply go back to Perl 3.0 and use require instead of use and forget about using the package statement.
It sounds like you have data in a file and are trying to load that data into your program.
As it is now, the our declarations in the module only declare variables for the scope of that file. Once the file finshes running, to access the variables, you need to use their fully qualified name. If your module has a package xyz; line, then the fully qualified name is $xzy::Var1. If there is no package declaration, then the default package main is used, giving your variables the name $main::Var1
However, any time that you are making many variables all with numeric name changes, you probably should be using an array.
Change your module to something like:
#My::Module::Data = ("1\n", "2\n" ... )
and then access the items by index:
$My::Module::Data[1]

Should Perl boilerplate go before or after a package declaration?

Assuming there's only one package in a file, does the order or the following Perl boilerplate matter? if there are no technical reasons are there any aesthetic?
use 5.006;
use strict;
use warnings;
package foo;
The order matters if any part of your boiler plate imports any subroutines or variables, or does anything tricky with the caller's namespace.
If you get into the habit of placing it before the package name, then one day when you want to add use List::Util 'reduce'; to your boiler plate, the subroutine will be imported into main instead of foo. So package foo will not have reduce imported, and you may be scratching your head for a while trying to figure out why it isn't working.
The reason why it doesn't matter with the three imports you have shown is that they are all pragmatic modules (or assertions), and their effect is lexically scoped, not package scoped. Placed at the top of the file, those pragmas will be in effect for the entire file.
The order doesn't matter from a technical standpoint.
It's always been my practice (and, fwiw, what's used in Perl Best Practices) to put the package declaration at the very start. I'd suggest a blank line before the package declaration if there's anything before it, to make it stand out.

Old .pl modules versus new .pm modules

I'm a beginner in Perl and I'm trying to build in my head the best ways of structuring a Perl program. I'm proficient in Python and I'm used to the python from foo import bar way of importing functions and classes from python modules. As I understood in Perl there are many ways of doing this, .pm and .pl modules, EXPORTs and #ISAs, use and require, etc. and it is not easy for a beginner to get a clear idea of which are the differences, advantages and drawbacks of each (even after reading Beginning Perl and Intermediate Perl).
The problem stated, my current question is related to a sentence from perldoc perlmod:
Perl module files have the
extension .pm. The use operator
assumes this so you don't have to
spell out "Module.pm" in quotes. This
also helps to differentiate new
modules from old .pl and .ph files.
Which are the differences between old .pl way of preparing modules and the new .pm way?
Are they really the old and the modern way? (I assume they are because Perlmod says that but I would like to get some input about this).
The use function and .pm-type modules were introduced in Perl 5, released 16 years ago next month. The "old .pl and .ph files" perlmod is referring to were used with Perl 4 (and earlier). At this point, they're only interesting to computer historians. For your purposes, just forget about .pl libraries.
Which are the differences between old .pl way of preparing modules and the new .pm way?
You can find few old modules inside the Perl's own standard library (pointed to by #INC, the paths can be seen in perl -V output).
In older times, there were no packages. One was doing e.g. require "open2.pl"; which is analogous to essentially including the content of file as it is in the calling script. All functions declared, all global variables were becoming part of the script's context. Or in other words: polluting your context. Including several files might have lead to all possible conflicts.
New modules use package keyword to define their own context and name of the namespace. When use-ed by a script, new modules have possibility to not import/add anything to the immediate context of the script thus prevent namespace pollution and potential conflicts.
#EXPORT/#EXPORT_OK lists are used by standard utility module Exporter which helps to import the module functions into the calling context: so that one doesn't have to write all the time full name of the functions. The lists are generally customized by the module depending on the parameter list passed to the use like in use POSIX qw/:errno_h/;. See perldoc Exporter for more details.
#ISA is a Perl's inheritance mechanism. It tells Perl that if it can't find a function inside of the current package, to scan for the function inside all the packages mentioned in the #ISA. Simple modules often have there only the Exporter mentioned to use its import() method (what is also well described in the same perldoc Exporter).
Reusing code by creating .pl files (the "pl" actually stands for "Perl library") was the way that it was done back in Perl 4 - before we had the 'package' keyword and the 'use' statement.
It's a nasty old way of doing things. If you're coming across documentation that recommends it then that's a strong indication that you should ignore that documentation as it's either really old or written by someone who hasn't kept up to date with Perl development for over fifteen years.
For some examples of the different ways of building Perl modules in the modern way, see my answer to Perl Module Method Calls: Can't call method “X” on an undefined value at ${SOMEFILE} line ${SOMELINE}
I don't know nothing about .pl rather modules rather than they did exist some time ago, nobody seems to use them nowadays so you proably shouldn't use them either.
Stick to pm modules, ignore #ISA right now, that's for OOP. Export isn't that important either, because you can always call your methods fully quallified.
So rather than writing this:
file: MyPkg.pm
package MyPkg;
#EXPORT = qw(func1 func2);
sub func1 { ... };
sub func2 { ... };
file: main.pl
#!/usr/bin/perl
use strict;
use warnings;
use MyPkg;
&func1();
you should, for the beginning, write that:
file: MyPkg.pm
package MyPkg;
sub func1 { ... };
sub func2 { ... };
file: main.pl
#!/usr/bin/perl
use strict;
use warnings;
use MyPkg;
&MyPkg::func1();
And later when you see which methods should really be exported you can do that without having to change your exisiting code.
The use loades your module and call import, which would make any EXPORTed subs avalable in your current package. In the seconds example a require would do, which doesn't call import, but I tend to always use 'use'.

Does Perl monkey-patching allow you to see the patched package's scope?

I'm monkey patching a package using a technique given at the beginning of "How can I monkey-patch an instance method in Perl?". The problem that I'm running into is that the original subroutine used a package-level my variable which the patched subroutine appears not to have access to, either by full path specification or implicit use.
Is there any way to get at the data scoped in this way for use in the patched subroutine?
You can obtain lexicals with the PadWalker module. Evil, but it works.
No. The thing you're mistaken in is that they are not package scoped. A lexical variable is by definition limited to its lexical scope, in other words, the block it is in.
Lexicals (ie: declared with 'my') are not visible outside the lexical scope (file or block) in which they are declared. That's the whole point of lexical variables.
If there is a subroutine/method which is in the same scope as the lexical var, then it can return the value of the lexical and that can allow indirect access to the var from outside its scope.
There is no such thing as a 'full path specification' for lexical variables. That's for package variables. If the var was declared with 'our' instead of 'my' you could do that.