Why declare Perl variable with "my" at file scope? - perl

I'm learning Perl and trying to understand variable scope. I understand that my $name = 'Bob'; will declare a local variable inside a sub, but why would you use the my keyword at the global scope? Is it just a good habit so you can safely move the code into a sub?
I see lots of example scripts that do this, and I wonder why. Even with use strict, it doesn't complain when I remove the my. I've tried comparing behaviour with and without it, and I can't see any difference.
Here's one example that does this:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbfile = "sample.db";
my $dsn = "dbi:SQLite:dbname=$dbfile";
my $user = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password, {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
FetchHashKeyName => 'NAME_lc',
});
# ...
$dbh->disconnect;
Update
It seems I was unlucky when I tested this behaviour. Here's the script I tested with:
use strict;
my $a = 5;
$b = 6;
sub print_stuff() {
print $a, $b, "\n"; # prints 56
$a = 55;
$b = 66;
}
print_stuff();
print $a, $b, "\n"; # prints 5566
As I learned from some of the answers here, $a and $b are special variables that are already declared, so the compiler doesn't complain. If I change the $b to $c in that script, then it complains.
As for why to use my $foo at the global scope, it seems like the file scope may not actually be the global scope.

The addition of my was about the best thing that ever happened to Perl and the problem it solved was typos.
Say you have a variable $variable. You do some assignments and comparisons on this variable.
$variable = 5;
# intervening assignments and calculations...
if ( $varable + 20 > 25 ) # don't use magic numbers in real code
{
# do one thing
}
else
{
# do something else
}
Do you see the subtle bug in the above code that happens if you don't use strict; and require variables be declared with my? The # do one thing case will never happen. I encountered this several times in production code I had to maintain.

A few points:
strict demands that all variables be declared with a my (or state) or installed into the package--declared with an our statement or a use vars pragma (archaic), or inserted into the symbol table at compile time.
They are that file's variables. They remain of no concern and no use to any module required during the use of that file.
They can be used across packages (although that's a less good reason.)
Lexical variables don't have any of the magic that the only alternative does. You can't "push" and "pop" a lexical variable as you change scope, as you can with any package variable. No magic means faster and plainer handling.
Laziness. It's just easier to declare a my with no brackets as opposed to concentrating its scope by specific bracketing.
{ my $visible_in_this_scope_only;
...
sub bananas {
...
my $bananas = $visible_in_this_scope_only + 3;
...
}
} # End $visible_in_this_scope_only
(Note on the syntax: in my code, I never use a bare brace. It will always tell you, either before (standard loops) or after what the scope is for, even if it would have been "obvious".

It's just good practice. As a personal rule, I try to keep variables in the smallest scope possible. If a line of code can't see a variable, then it can't mess with it in unexpected ways.
I'm surprised that you found that the script worked under use strict without the my, though. That's generally not allowed:
$ perl -E 'use strict; $db = "foo"; say $db'
Global symbol "$db" requires explicit package name at -e line 1.
Global symbol "$db" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
$ perl -E 'use strict; my $db = "foo"; say $db'
foo
Variables $a and $b are exempt:
$ perl -E 'use strict; $b = "foo"; say $b'
foo
But I don't know how you would make the code you posted work with strict and a missing my.

A sub controls/limits the scope of variables between the braces {} that define its operations. Of course many variables exist outside of a particular function and using lexical my for "global" variables can give you more control over how "dynamic" their behavior is inside your application. The Private Variables via my() section of perlodocperlsub discusses reasons for doing this pretty thoroughly.
I'm going to quote myself from elsewhere which is not the best thing to do on SO but here goes:
The classic perlmonks node - Variable Scoping in Perl: the
basics - is a frequently
consulted reference :-)
As I noted in a comment, Bruce Gray's talk at YAPC::NA 2012 - The why of my() is a good story about how a pretty expert perl programmer wrapped his head around perl and namespaces.
I've heard people explain my as Perl's equivalent to Javascript's var - it's practically necessary but, Perl being perl, things will work without it if you insist or take pains to make it do that.
ps: Actually with Javascript, I guess functions are used to control "scope" in a way that is analagous to your description of using my in sub's.

Related

Can variable declarations be placed in a common script

Before I start, the whole 'concept' may be technically impossible; hopefully someone will have more knowledge about such things, and advise me.
With Perl, you can "declare" global variables at the start of a script via my / our thus:
my ($a,$b,$c ..)
That's fine with a few unique variables. But I am using about 50 of them ... and the same names (not values) are used by five scripts. Rather than having to place huge my( ...) blocks at the start of each file, I'm wondering if there is a way to create them in one script. Note: Declare the namespace, not their values.
I have tried placing them all in a single file, with the shebang at the top, and a 1 at the bottom, and then tried "require", "use" and "do" to load them in. But - at certain times -the script complains it cannot find the global package name. (Maybe the "paths.pl" is setting up the global space relative to itself - which cannot be 'seen' by the other scripts)
Looking on Google, somebody suggested setting variables in the second file, and still setting the my in the calling script ... but that is defeating the object of what I'm trying to do, which is simply declare the name space once, and setting the values in another script
** So far, it seems if I go from a link in an HTML page to a perl script, the above method works. But when I call a script via XHTMLRequest using a similar setup, it cannot find the $a, $b, $c etc within the "paths" script
HTML
<form method="post" action="/cgi-bin/track/script1.pl>
<input type="submit" value="send"></form>
Perl: (script1.pl)
#shebang
require "./paths.pl"
$a=1;
$b="test";
print "content-type: text/html\n\n";
print "$a $b";
Paths.pl
our($a,
$b,
$c ...
)1;
Seems to work OK, with no errors. But ...
# Shebang
require "./paths.pl"
XHTMLREQUEST script1.pl
Now it complains it cannot find $a or $b etc as an "explicit package" for "script1.pl"
Am I moving into the territory of "modules" - of which I know little. Please bear in mind, I am NOT declaring values within the linked file, but rather setting up the 'global space' so that they can be used by all scripts which declare their own values.
(On a tangent, I thought - in the past - a file in the same directory could be accessed as "paths.pl" -but it won't accept that, and it insists on "./" Maybe this is part of the problem. I have tried absolute and relative paths too, from "url/cgi-bin/track/" to "/cgi-bin/track" but can't seem to get that to work either)
I'm fairly certain it's finding the paths file as I placed a "my value" before the require, and set a string within paths, and it was able to print it out.
First, lexical (my) variables only exist in their scope. A file is a scope, so they only exist in their file. You are now trying to work around that, and when you find yourself fighting the language that way, you should realize that you are doing it wrong.
You should move away from declaring all variables in one go at the top of a program. Declare them near the scope you want to use them, and declare them in the smallest scope possible.
You say that you want to "Set up a global space", so I think you might misunderstand something. If you want to declare a lexical variable in some scope, you just do it. You don't have to do anything else to make that possible.
Instead of this:
my( $foo, $bar, $baz );
$foo = 5;
sub do_it { $bar = 9; ... }
while( ... ) { $baz = 6; ... }
Declare the variable just where you want them:
my $foo = 5;
sub do_it { my $bar = 9; ... }
while( ... ) { my $baz = 6; ... }
Every lexical variable should exist in the smallest scope that can tolerate it. That way nothing else can mess with it and it doesn't retain values from previous operations when it shouldn't. That's the point of them, after all.
When you declare them to be file scoped, then don't declare them in the scope that uses them, you might have two unrelated uses of the same name conflicting with each other. One of the main benefits of lexical variables is that you don't have to know the names of any other variables in scope or in the program:
my( $foo, ... );
while( ... ) {
$foo = ...;
do_something();
...
}
sub do_something {
$foo = ...;
}
Are those uses of $foo in the while and the sub the same, or do they accidentally have the same name? That's a cruel question to leave up to the maintenance program.
If they are the same thing, make the subroutine get its value from its argument list instead. You can use the same names, but since each scope has it's own lexical variables, they don't interfere with each other:
while( ... ) {
my $foo = ...;
do_something($foo);
...
}
sub do_something {
my( $foo ) = #_;
}
See also:
How to share/export a global variable between two different perl scripts?
You say you aren't doing what I'm about to explain, but other people may want to do something similar to share values. Since you are sharing the same variable names across programs, I suspect that this is actually what it going on, though.
In that case, there are many modules on CPAN that can do that job. What you choose depends on what sort of stuff you are trying to share between programs. I have a chapter in Mastering Perl all about it.
You might be able to get away with something like this, where one module defines all the values and makes them available for export:
# in Local/Config.pm
package Local::Config;
use Exporter qw(import);
our #EXPORT = qw( $foo $bar );
our $foo = 'Some value';
our $bar = 'Different value';
1;
To use this, merely load it with use. It will automatically import the variables that you put in #EXPORT:
# in some program
use Local::Config;
We cover lots of this sort of stuff in Intermediate Perl.
What you want to do here is a form of boilerplate management. Shoving variable declarations into a module or class file. This is a laudable goal. In fact you should shove as much boilerplate into that other module as possible. It makes it far easier to keep consistent behavior across the many scripts in a project. However shoving variables in there will not be as easy as you think.
First of all, $a and $b are special variables reserved for use in sort blocks so they never have to be declared. So using them here will not validate your test. require always searches for the file in #INC. See perlfunc require.
To declare a variable it has to be done at compile time. our, my, and state all operate at compile time and legalize a symbol in a lexical scope. Since a module is a scope, and require and do both create a scope for that file, there is no way to have our (let alone my and state) reach back to a parent scope to declare a symbol.
This leaves you with two options. Export package globals back to the calling script or munge the script with a source filter. Both of these will give you heartburn. Remember that it has to be done at compile time.
In the interest of computer science, here's how you would do it (but don't do it).
#boilerplate.pm
use strict;
use vars qw/$foo $bar/;
1;
__END__
#script.pl
use strict;
use boilerplate;
$foo = "foo here";
use vars is how you declare package globals when strict is in effect. Package globals are unscoped ("global") so it doesn't matter what scope or file they're declared in. (NB: our does not create a global like my creates a lexical. our creates a lexical alias to a global, thus exposing whatever is there.) Notice that boilerplate.pm has no package declaration. It will inherit whatever called it which is what you want.
The second way using source filters is devious. You create a module that rewrites the source code of your script on the fly. See Filter::Simple and perlfilter for more information. This only works on real scripts, not perl -e ....
#boilerplate.pm
package boilerplate;
use strict; use diagnostics;
use Filter::Simple;
my $injection = '
our ($foo, $bar);
my ($baz);
';
FILTER { s/__FILTER__/$injection/; }
__END__
#script.pl
use strict; use diagnostics;
use boilerplate;
__FILTER__
$foo = "foo here";
You can make any number of filtering tokens or scenarios for code substitution. e.g. use boilerplate qw/D2_loadout/;
These are the only ways to do it with standard Perl. There are modules that let you meddle with calling scopes through various B modules but you're on your own there. Thanks for the question!
HTH

What is wrong in "my $foo = $x if $y" syntax?

In my last question here, #amon gave an great answer. However, he told too:
First of all, please don't do my $foo = $x if $y. You get unexpected
and undefined behavior, so it is best to avoid that syntax.
Because the above construction I was see in really many sources in the CPAN, I'm wondering how, when, where can be it wrong. (Some example code would be nice). Wondering too, why perl allows it, if it is bad.
His wording was actually a bit laxer. That wording is actually mine. Let's start with the documentation: (Emphasis in original)
NOTE: The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ...) is undefined. The value of the my variable may be undef, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons.
To be more precise, the problem is using a lexical variable when its my may not have been executed.
Consider:
# Usage:
# f($x) # Store a value
# f() # Fetch and clear the stored value
sub f {
my $x if !#_;
if (#_) {
$x = $_[0];
} else {
return $x;
}
}
f('abc');
say "<", f(), ">" # abc
This is obviously not the documented behaviour of my.
Because the above construction I was see in really many sources in the CPAN
That code is buggy. If you want a value to persist between calls to a sub, you can use a state variable since Perl 5.10, or a variable outside of the sub.

What is the preferred way of interpolating a constant in a here doc?

I'm sure there are several ways of getting the value 'bar' to interpolate in the <> below, but what is the cleanest way, and why?
use constant FOO => 'bar';
my $msg = <<EOF;
Foo is currently <whatever goes here to expand FOO>
EOF
There are two kinds of here-docs:
<<'END', which behaves roughly like a single quoted string (but no escapes), and
<<"END", also <<END, which behaves like a double quoted string.
To interpolate a value in a double quoted string use a scalar variable:
my $foo = "bar";
my $msg = "Foo is currently $foo\n";
Or use the arrayref interpolation trick
use constant FOO => "bar";
my $msg = "Foo is currently #{[ FOO ]}\n";
You could also define a template language to substitute in the correct value. This may or may not be better depending on your problem domain:
my %vars = (FOO => "bar");
my $template = <<'END';
Foo is currently %FOO%;
END
(my $msg = $template) =~ s{%(\w+)%}{$vars{$1} // die "Unknown variable $1"}eg;
The problem with a lot of the CPAN modules that do a nicer job of constants than the use constant pragma is that they just aren't part of the standard Perl package. Unfortunately, it can be very difficult to download CPAN modules on machines you might not own.
Therefore, I've just decided to stick to use constant until Perl starts to include something like Readonly as part of its standard modules (and only when distros like RedHat and Solaris decide to update to those versions of Perl. I'm still stuck with 5.8.8 on our production servers.)
Fortunately, you can interpolate constants defined with use constant if you know the arcane and mystical incantations that has been passed down from hacker to hacker.
Put #{[...]} around the constant. This can also work with methods from classes too:
use 5.12.0;
use constant {
FOO => "This is my value of foo",
};
my $data =<<EOT;
this is my very long
value of my variable that
also happens to contain
the value of the constant
'FOO' which has the value
of #{[FOO]}
EOT
say $data;
Output:
this is my very long
value of my variable that
also happens to contain
the value of the constant
'FOO' which has the value
of This is my value of foo
Using a method:
say "The employee's name is #{[$employee->Name]}";
Aside:
There is also another way to use constants that I used to employ before use constant was around. It went like this:
*FOO = \"This is my value of foo";
our $FOO;
my $data =<<EOT;
this is my very long
value blah, blah, blah $FOO
EOT
say $data;
You can use $FOO as any other scalar value, and it can't be modified. You try to modify the value and you get:
Modification of a read-only value attempted at ...
Use Const::Fast instead of Readonly or constant. They interpolate without any contortions. See CPAN modules for defining constants:
For conditional compilation, constant is a good choice. It's a mature module and widely used.
...
If you want array or hash constants, or immutable rich data structures, use Const::Fast. It's a close race between that and Attribute::Constant, but Const::Fast seems maturer, and has had more releases.
On the other hand, you seem to be writing your own templating code. Don't. Instead, use something simple like HTML::Template:
use HTML::Template;
use constant FOO => 'bar';
my $tmpl = HTML::Template->new(scalarref => \ <<EOF
Foo is currently <TMPL_VAR VALUE>
EOF
);
$tmpl->param(VALUE => FOO);
print $tmpl->output;
Have you considered using "read-only variables" as constants?
perlcritic recomends it at severity level 4 (default is level 5)
use Readonly;
Readonly my $FOO => 'bar';
my $msg = <<"EOF";
Foo is currently <$FOO>
EOF
P.S. Module Const::Fast (inspired by noduleReadonly) seems to be a better choice.
Late to the party, but another version of the arrayref trick can do this in a scalar context: ${\FOO}. Example, tested in perl 5.22.2 on cygwin:
use constant FOO=>'bar';
print <<EOF
backslash -${\FOO}-
backslash and parens -${\(FOO)}-
EOF
produces
backslash -bar-
backslash and parens -bar-
Thanks to d-ash for introducing me to this technique, which he uses in his perlpp source preprocessor here (see also this answer). (Disclaimer: I am now the lead maintainer of perlpp - GitHub; CPAN.)

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.

Is it a design flaw that Perl subs aren't lexically scoped?

{
sub a {
print 1;
}
}
a;
A bug,is it?
a should not be available from outside.
Does it work in Perl 6*?
* Sorry I don't have installed it yet.
Are you asking why the sub is visible outside the block? If so then its because the compile time sub keyword puts the sub in the main namespace (unless you use the package keyword to create a new namespace). You can try something like
{
my $a = sub {
print 1;
};
$a->(); # works
}
$a->(); # fails
In this case the sub keyword is not creating a sub and putting it in the main namespace, but instead creating an anonymous subroutine and storing it in the lexically scoped variable. When the variable goes out of scope, it is no longer available (usually).
To read more check out perldoc perlsub
Also, did you know that you can inspect the way the Perl parser sees your code? Run perl with the flag -MO=Deparse as in perl -MO=Deparse yourscript.pl. Your original code parses as:
sub a {
print 1;
}
{;};
a ;
The sub is compiled first, then a block is run with no code in it, then a is called.
For my example in Perl 6 see: Success, Failure. Note that in Perl 6, dereference is . not ->.
Edit: I have added another answer about new experimental support for lexical subroutines expected for Perl 5.18.
In Perl 6, subs are indeed lexically scoped, which is why the code throws an error (as several people have pointed out already).
This has several interesting implications:
nested named subs work as proper closures (see also: the "will not stay shared" warning in perl 5)
importing of subs from modules works into lexical scopes
built-in functions are provided in an outer lexical scope (the "setting") around the program, so overriding is as easy as declaring or importing a function of the same name
since lexpads are immutable at run time, the compiler can detect calls to unknown routines at compile time (niecza does that already, Rakudo only in the "optimizer" branch).
Subroutines are package scoped, not block scoped.
#!/usr/bin/perl
use strict;
use warnings;
package A;
sub a {
print 1, "\n";
}
a();
1;
package B;
sub a {
print 2, "\n";
}
a();
1;
Named subroutines in Perl are created as global names. Other answers have shown how to create a lexical subroutines by assigning an anonymous sub to a lexical variable. Another option is to use a local variable to create a dynamically scoped sub.
The primary differences between the two are call style and visibility. The dynamically scoped sub can be called like a named sub, and it will also be globally visible until the block it is defined in is left.
use strict;
use warnings;
sub test_sub {
print "in test_sub\n";
temp_sub();
}
{
local *temp_sub = sub {
print "in temp_sub\n";
};
temp_sub();
test_sub();
}
test_sub();
This should print
in temp_sub
in test_sub
in temp_sub
in test_sub
Undefined subroutine &main::temp_sub called at ...
At the risk of another scolding by #tchrist, I am adding another answer for completeness. The as yet to be released Perl 5.18 is expected to include lexical subroutines as an experimental feature.
Here is a link to the relevant documentation. Again, this is very experimental, it should not be used for production code for two reasons:
It might not be well implemented yet
It might be removed without notice
So play with this new toy if you want, but you have been warned!
If you see the code compile, run and print "1", then you are not experiencing a bug.
You seem to be expecting subroutines to only be callable inside the lexical scope in which they are defined. That would be bad, because that would mean that one wouldn't be able to call subroutines defined in other files. Maybe you didn't realise that each file is evaluated in its own lexical scope? That allows the likes of
my $x = ...;
sub f { $x }
Yes, I think it is a design flaw - more specifically, the initial choice of using dynamic scoping rather than lexical scoping made in Perl, which naturally leads to this behavior. But not all language designers and users would agree. So the question you ask doesn't have a clear answer.
Lexical scoping was added in Perl 5, but as an optional feature, you always need to indicate it specifically. With that design choice I fully agree: backward compatibility is important.