Correct way of variable declaration in Perl

Correct way of variable declaration in Perl - perl

I have a set of 3 or 4 separate Perl scripts that used to be part of a simple pipeline, but I am now trying to combine them in a single script for easier use (for now without subroutine functions). The thing is that several variables with the same name are defined in the different scripts. The workaround I found was to give different names to those variables, but it can start to become messy and probably it is not the correct way of doing so.
I know the concept of global and local variables but I do not quite understand how do they exactly work.
Are there any rules of thumb for dealing with this sort of variables? Do you know any good documentation that can shed some light on variable-scope or have any advise on this?
Thanks.
EDITED: I already use "use warnings; use strict;" and declare variables with "my". The question might actually be more related to the definition of scoping blocks and how to get them to be independent from each other...

You are likely getting into trouble because of your use of global variables (which actually likely exist in package main). You should try to avoid the use of global variables.
And to do so, you should become familiar with the meaning of variable scope. Although somewhat dated, Coping with Scoping offers a good introduction to this topic. Also see this answer and the others to the question How to properly use Global variables in perl. (Short Answer: avoid them to the degree possible.)
The principle of variable scope and limiting use of global variables actually applies to nearly all programming languages. You should get in the habit of declaring variables as close as possible to the point where you are actually using them.
And finally, to save yourself from a lot of headaches, get in the habit of:
including use strict; and use warnings; at the top every Perl source file, and
declaring variables with my within each of your sub's (to limit the scope of those variables to the sub).
(See this PerlMonks article for more on this recommendation.)
I refer to this practice as "Perl programming with your seat belt on." :-)

The rule of thumb is to put your code in subroutines, each of them focused on a simple, well-defined part of the larger process. From this one decision flow many virtuous outcomes, including a natural solution to the variable scoping problem you asked about.
sub foo {
my $x = 99;
...
}
sub bar {
my $x = 1234; # Won't interfere with foo's $x.
...
}
If, for some reason, you really don't want to do that, you can wrap each section of the code in scoping blocks, and make sure you declare all variables with my (you should be doing the latter nearly always as a matter of common practice):
{
my $x = 99;
...
}
{
my $x = 1234; # Won't interfere with the other $x.
...
}

Related

Perlcritic: How can I resolve '^Magic variable "$ENV" should be assigned as "local"'?

I am writing a perl script that needs to set a number of environment variables before calling an external program. My code has the form
$ENV{'VAR1'} = "value1";
$ENV{'VAR2'} = "value2";
When running this through perlcritic, I get a severity 4 violation for every such assignement:
^Magic variable "$ENV" should be assigned as "local"
Googling that error message didn't give me any good solutions. The perlcritic violation that complains in that case is Variables::RequireLocalizedPunctuationVars, and the example given deals with localizing a file handle. I tried to find the relevant section in Perl Best Practices, but it only talks about localizing package variables.
One solution I tried is to localize %ENV using the following statement before the assignments.
local %ENV = ();
This doesn't resolve the violation.
My question is the following:
Is that Perlcritic violation even relevant for assignments to %ENV, or can I ignore it?
If it's relevant, what's the best way to resolve it?

Perlcritic warnings are not The Word of God. They are simply warnings about situations that, if managed incorrectly, could get you into trouble.
Is that Perlcritic violation even relevant for assignments to %ENV, or
can I ignore it?
This warning tells you that:
Global Variables have the very real possibility of action at a distance.
This possibility is even more dangerous when dealing with those variables that change the operation of built in functions.
Is that relevant for %ENV? If you spawn more than one child process in your program, Yes. If someone changes your program later to spawn another child, Yes.
If it's relevant, what's the best way to resolve it?
Now here is where being the human being becomes important. You need to make a value judgement.
Possible actions:
Ignore the warning and hope that future maintainers aren't bitten by your usage of this Global Variable.
Change your code to avoid the situation you are being warned about. The syntax suggested by choroba is a fine option.
Now if you have made a change, are still getting the warning, and believe that the warning is now in error, you can do one or more of:
Be a good citizen and Submit a bug report.
use the comment ## no critic (RequireLocalizedPunctuationVars) on the affected lines or on it's own line before the lines in question.
Or more excessively disable the rule or just create an exception for %ENV in your .perlcriticrc file.

You can localize the value for the given environment variable only:
local $ENV{VAR1} = 'value1';

Consider using Env. perlcritic does not complain about the variable:
use warnings;
use strict;
use Env qw(VAR);
$VAR = "value1";

Why doesn't this run forever?

I was looking at a rather inconclusive question about whether it is best to use for(;;) or while(1) when you want to make an infinite loop and I saw an interesting solution in C where you can #define "EVER" as a constant equal to ";;" and literally loop for(EVER).
I know defining an extra constant to do this is probably not the best programming practice but purely for educational purposes I wanted to see if this could be done with Perl as well.
I tried to make the Perl equivalent, but it only loops once and then exits the loop.
#!/usr/bin/perl -w
use strict;
use constant EVER => ';;';
for (EVER) {
print "FOREVER!\n";
}
Output:
FOREVER!
Why doesn't this work in perl?

C's pre-processor constants are very different from the constants in most languages.
A normal constant acts like a variable which you can only set once; it has a value which can be passed around in most of the places a variable can be, with some benefits from you and the compiler knowing it won't change. This is the type of constant that Perl's constant pragma gives you. When you pass the constant to the for operator, it just sees it as a string value, and behaves accordingly.
C, however, has a step which runs before the compiler even sees the code, called the pre-processor. This actually manipulates the text of your source code without knowing or caring what most of it means, so can do all sorts of things that you couldn't do in the language itself. In the case of #DEFINE EVER ;;, you are telling the pre-processor to replace every occurrence of EVER with ;;, so that when the actual compiler runs, it only sees for(;;). You could go a step further and define the word forever as for(;;), and it would still work.
As mentioned by Andrew Medico in comments, the closest Perl has to a pre-processor is source filters, and indeed one of the examples in the manual is an emulation of #define. These are actually even more powerful than pre-processor macros, allowing people to write modules like Acme::Bleach (replaces your whole program with whitespace while maintaining functionality) and Lingua::Romana::Perligata (interprets programs written in grammatically correct Latin), as well as more sensible features such as adding keywords and syntax for class and method declarations.

It doesn't run forever because ';;' is an ordinary string, not a preprocessor macro (Perl doesn't have an equivalent of the C preprocessor). As such, for (';;') runs a single time, with $_ set to ';;' that one time.

Andrew Medico mentioned in his comment that you could hack it together with a source filter.
I confirmed this, and here's an example.
use Filter::cpp;
#define EVER ;;
for (EVER) {
print "Forever!\n";
}
Output:
Forever!
Forever!
Forever!
... keeps going ...
I don't think I would recommend doing this, but it is possible.

This is not possible in Perl. However, you can define a subroutine named forever which takes a code block as a parameter and runs it again and again:
#!/usr/bin/perl
use warnings;
use strict;
sub forever (&) {
$_[0]->() while 1
}
forever {
print scalar localtime, "\n";
sleep 1;
};

"Fake" global lexical variables in Common Lisp

It is stated in section "Global variables and constants" of the Google Common Lisp Style Guide that:
"Common Lisp does not have global lexical variables, so a naming convention is used to ensure that globals, which are dynamically bound, never have names that overlap with local variables.
It is possible to fake global lexical variables with a differently named global variable and a DEFINE-SYMBOL-MACRO. You should not use this trick, unless you first publish a library that abstracts it away."
Can someone, please, help me to understand the meaning of this last sentence.

The last sentence,
You should not use this trick, unless you first publish a library that abstracts it away.
means that if you do something that simulates global lexical variables, then the implementation of that simulation should not be apparent to the user. For instance, you might simulate a global lexical using some scheme using define-symbol-macro, but if you do, it should be transparent to the user. See Ron Garret's GLOBALS — Global Variables Done Right for an example of “a library that abstracts it away.”

Why doesn't map read from #ARGV/#_?

Is there a good reason for map to not read from #_ (in functions) or #ARGV (anywhere else) when not given an argument list?

I can't say why Larry didn't make map, grep and the other list functions operate on #_ like pop and shift do, but I can tell you why I wouldn't. Default variables used to be in vogue, but Perl programmers have discovered that most of the "default" behaviors cause more problems than they solve. I doubt they would make it into the language today.
The first problem is remembering what a function does when passed no arguments. Does it act on a hidden variable? Which one? You just have to know by rote, and that makes it a lot more work to learn, read and write the language. You're probably going to get it wrong and that means bugs. This could be mitigated by Perl being consistent about it (ie. ALL functions which take lists operate on #_ and ALL functions which take scalars operate on $_) but there's more problems.
The second problem is the behavior changes based on context. Take some code outside of a subroutine, or put it into a subroutine, and suddenly it works differently. That makes refactoring harder. If you made it work on just #_ or just #ARGV then this problem goes away.
Third is default variables have a tendency to be quietly modified as well as read. $_ is dangerous for this reason, you never know when something is going to overwrite it. If the use of #_ as the default list variable were adopted, this behavior would likely leak in.
Fourth, it would probably lead to complicated syntax problems. I'd imagine this was one of the original reasons keeping it from being added to the language, back when $_ was in vogue.
Fifth, #ARGV as a default makes some sense when you're writing scripts that primarily work with #ARGV... but it doesn't make any sense when working on a library. Perl programmers have shifted from writing quick scripts to writing libraries.
Sixth, using $_ as default is a way of chaining together scalar operations without having to write the variable over and over again. This might have been mitigated if Perl was more consistent about its return values, and if regexes didn't have special syntax, but there you have it. Lists can already be chained, map { ... } sort { ... } grep /.../, #foo, so that use case is handled by a more efficient mechanism.
Finally, it's of very limited use. It's very rare that you want to pass #_ to map and grep. The problems with hidden defaults are far greater than avoiding typing two characters. This space savings might have slightly more sense when Perl was primarily for quick and dirty work, but it makes no sense when writing anything beyond a few pages of code.
PS shift defaulting to #_ has found a niche in my $self = shift, but I find this only shines because Perl's argument handling is so poor.

The map function takes in a list, not an array. shift takes an array. With lists, on the other hand, #_/#ARGV may or may not be fair defaults.

Is it okay to use modules from within subroutines?

Recently I start playing with OO Perl and I've been creating quite a bunch of new objects for a new project that I'm working on. As I'm unfamilliar with any best practice regarding OO Perl and we're kind in a tight rush to get it done :P
I'm putting a lot of this kind of code into each of my function:
sub funcx{
use ObjectX; # i don't declare this on top of the pm file
# but inside the function itself
my $obj = new ObjectX;
}
I was wondering if this will cause any negative impact versus putting on the use Object line on top of the Perl modules outside of any function scope.
I was doing this so that I feel it's cleaner in case I need to shift the function around.
And the other thing that I have noticed is that when I try to run a test.pl script on the unix server itself which test my objects, it slow as heck. But when the same code are run through CGI which is connected to an apache server, the web page doesn't load as slowly.

Where to put use?
use occurs at compile time, so it doesn't matter where you put it. At least from a purely pragmatic, 'will it work', point of view. Because it happens at compile time use will always be executed, even if you put it in a conditional. Never do this: if( $foo eq 'foo' ) { use SomeModule }
In my experience, it is best to put all your use statements at the top of the file. It makes it easy to see what is being loaded and what your dependencies are.
Update:
As brian d foy points out, things compiled before the use statement will not be affected by it. So, the location can matter. For a typical module, location does not matter, however, if it does things that affect compilation (for example it imports functions that have prototypes), the location could matter.
Also, Chas Owens points out that it can affect compilation. Modules that are designed to alter compilation are called pragmas. Pragmas are, by convention, given names in all lower-case. These effects apply only within the scope where the module is used. Chas uses the integer pragma as an example in his answer. You can also disable a pragma or module over a limited scope with the keyword no.
use strict;
use warnings;
my $foo;
print $foo; # Generates a warning
{ no warnings 'unitialized`; # turn off warnings for working with uninitialized values.
print $foo; # No warning here
}
print $foo; # Generates a warning
Indirect object syntax
In your example code you have my $obj = new ObjectX;. This is called indirect object syntax, and it is best avoided as it can lead to obscure bugs. It is better to use this form:
my $obj = ObjectX->new;
Why is your test script slow on the server?
There is no way to tell with the info you have provided.
But the easy way to find out is to profile your code and see where the time is being consumed. NYTProf is another popular profiling tool you may want to check out.
Best practices
Check out Perl Best Practices, and the quick reference card. This page has a nice run down of Damian Conway's OOP advice from PBP.
Also, you may wish to consider using Moose. If the long script startup time is acceptable in your usage, then Moose is a huge win.

question 1
It depends on what the module does. If it has lexical effects, then it will only affect the scope it is used in:
my $x;
{
use integer;
$x = 5/2; #$x is now 2
}
my $y = 5/2; #$y is now 2.5
If it is a normal module then it makes no difference where you use it, but it is common to use all of those modules at the top of the program.
question 2
Things that can affect the speed of a program between machines
speed of the processor
version of modules installed (some modules have XS versions that are much faster)
version of Perl
number of entries in PERL5LIB
speed of the drive

daotoad and Chas. Owens already answered the part of your question pertaining to the position of use statements. Let me remark on something else here:
I was doing this so that I feel it's
cleaner in case I need to shift the
function around.
Personally, I find it much cleaner to have all the used modules in one place at the top of the file. You won't have to search for use statements to see what other modules are being used and a quick glance will tell you what is being used and even what is not being used.
Regarding your performance problem: with Apache and mod_perl the Perl interpreter will have to parse and compile your used modules only once. The next time the script is run, execution should be much faster. On the command line, however, a second run doesn't get this benefit.