Is it a design flaw that Perl subs aren't lexically scoped? - perl

{
sub a {
print 1;
}
}
a;
A bug,is it?
a should not be available from outside.
Does it work in Perl 6*?
* Sorry I don't have installed it yet.

Are you asking why the sub is visible outside the block? If so then its because the compile time sub keyword puts the sub in the main namespace (unless you use the package keyword to create a new namespace). You can try something like
{
my $a = sub {
print 1;
};
$a->(); # works
}
$a->(); # fails
In this case the sub keyword is not creating a sub and putting it in the main namespace, but instead creating an anonymous subroutine and storing it in the lexically scoped variable. When the variable goes out of scope, it is no longer available (usually).
To read more check out perldoc perlsub
Also, did you know that you can inspect the way the Perl parser sees your code? Run perl with the flag -MO=Deparse as in perl -MO=Deparse yourscript.pl. Your original code parses as:
sub a {
print 1;
}
{;};
a ;
The sub is compiled first, then a block is run with no code in it, then a is called.
For my example in Perl 6 see: Success, Failure. Note that in Perl 6, dereference is . not ->.
Edit: I have added another answer about new experimental support for lexical subroutines expected for Perl 5.18.

In Perl 6, subs are indeed lexically scoped, which is why the code throws an error (as several people have pointed out already).
This has several interesting implications:
nested named subs work as proper closures (see also: the "will not stay shared" warning in perl 5)
importing of subs from modules works into lexical scopes
built-in functions are provided in an outer lexical scope (the "setting") around the program, so overriding is as easy as declaring or importing a function of the same name
since lexpads are immutable at run time, the compiler can detect calls to unknown routines at compile time (niecza does that already, Rakudo only in the "optimizer" branch).

Subroutines are package scoped, not block scoped.
#!/usr/bin/perl
use strict;
use warnings;
package A;
sub a {
print 1, "\n";
}
a();
1;
package B;
sub a {
print 2, "\n";
}
a();
1;

Named subroutines in Perl are created as global names. Other answers have shown how to create a lexical subroutines by assigning an anonymous sub to a lexical variable. Another option is to use a local variable to create a dynamically scoped sub.
The primary differences between the two are call style and visibility. The dynamically scoped sub can be called like a named sub, and it will also be globally visible until the block it is defined in is left.
use strict;
use warnings;
sub test_sub {
print "in test_sub\n";
temp_sub();
}
{
local *temp_sub = sub {
print "in temp_sub\n";
};
temp_sub();
test_sub();
}
test_sub();
This should print
in temp_sub
in test_sub
in temp_sub
in test_sub
Undefined subroutine &main::temp_sub called at ...

At the risk of another scolding by #tchrist, I am adding another answer for completeness. The as yet to be released Perl 5.18 is expected to include lexical subroutines as an experimental feature.
Here is a link to the relevant documentation. Again, this is very experimental, it should not be used for production code for two reasons:
It might not be well implemented yet
It might be removed without notice
So play with this new toy if you want, but you have been warned!

If you see the code compile, run and print "1", then you are not experiencing a bug.
You seem to be expecting subroutines to only be callable inside the lexical scope in which they are defined. That would be bad, because that would mean that one wouldn't be able to call subroutines defined in other files. Maybe you didn't realise that each file is evaluated in its own lexical scope? That allows the likes of
my $x = ...;
sub f { $x }

Yes, I think it is a design flaw - more specifically, the initial choice of using dynamic scoping rather than lexical scoping made in Perl, which naturally leads to this behavior. But not all language designers and users would agree. So the question you ask doesn't have a clear answer.
Lexical scoping was added in Perl 5, but as an optional feature, you always need to indicate it specifically. With that design choice I fully agree: backward compatibility is important.

Related

Can variable declarations be placed in a common script

Before I start, the whole 'concept' may be technically impossible; hopefully someone will have more knowledge about such things, and advise me.
With Perl, you can "declare" global variables at the start of a script via my / our thus:
my ($a,$b,$c ..)
That's fine with a few unique variables. But I am using about 50 of them ... and the same names (not values) are used by five scripts. Rather than having to place huge my( ...) blocks at the start of each file, I'm wondering if there is a way to create them in one script. Note: Declare the namespace, not their values.
I have tried placing them all in a single file, with the shebang at the top, and a 1 at the bottom, and then tried "require", "use" and "do" to load them in. But - at certain times -the script complains it cannot find the global package name. (Maybe the "paths.pl" is setting up the global space relative to itself - which cannot be 'seen' by the other scripts)
Looking on Google, somebody suggested setting variables in the second file, and still setting the my in the calling script ... but that is defeating the object of what I'm trying to do, which is simply declare the name space once, and setting the values in another script
** So far, it seems if I go from a link in an HTML page to a perl script, the above method works. But when I call a script via XHTMLRequest using a similar setup, it cannot find the $a, $b, $c etc within the "paths" script
HTML
<form method="post" action="/cgi-bin/track/script1.pl>
<input type="submit" value="send"></form>
Perl: (script1.pl)
#shebang
require "./paths.pl"
$a=1;
$b="test";
print "content-type: text/html\n\n";
print "$a $b";
Paths.pl
our($a,
$b,
$c ...
)1;
Seems to work OK, with no errors. But ...
# Shebang
require "./paths.pl"
XHTMLREQUEST script1.pl
Now it complains it cannot find $a or $b etc as an "explicit package" for "script1.pl"
Am I moving into the territory of "modules" - of which I know little. Please bear in mind, I am NOT declaring values within the linked file, but rather setting up the 'global space' so that they can be used by all scripts which declare their own values.
(On a tangent, I thought - in the past - a file in the same directory could be accessed as "paths.pl" -but it won't accept that, and it insists on "./" Maybe this is part of the problem. I have tried absolute and relative paths too, from "url/cgi-bin/track/" to "/cgi-bin/track" but can't seem to get that to work either)
I'm fairly certain it's finding the paths file as I placed a "my value" before the require, and set a string within paths, and it was able to print it out.
First, lexical (my) variables only exist in their scope. A file is a scope, so they only exist in their file. You are now trying to work around that, and when you find yourself fighting the language that way, you should realize that you are doing it wrong.
You should move away from declaring all variables in one go at the top of a program. Declare them near the scope you want to use them, and declare them in the smallest scope possible.
You say that you want to "Set up a global space", so I think you might misunderstand something. If you want to declare a lexical variable in some scope, you just do it. You don't have to do anything else to make that possible.
Instead of this:
my( $foo, $bar, $baz );
$foo = 5;
sub do_it { $bar = 9; ... }
while( ... ) { $baz = 6; ... }
Declare the variable just where you want them:
my $foo = 5;
sub do_it { my $bar = 9; ... }
while( ... ) { my $baz = 6; ... }
Every lexical variable should exist in the smallest scope that can tolerate it. That way nothing else can mess with it and it doesn't retain values from previous operations when it shouldn't. That's the point of them, after all.
When you declare them to be file scoped, then don't declare them in the scope that uses them, you might have two unrelated uses of the same name conflicting with each other. One of the main benefits of lexical variables is that you don't have to know the names of any other variables in scope or in the program:
my( $foo, ... );
while( ... ) {
$foo = ...;
do_something();
...
}
sub do_something {
$foo = ...;
}
Are those uses of $foo in the while and the sub the same, or do they accidentally have the same name? That's a cruel question to leave up to the maintenance program.
If they are the same thing, make the subroutine get its value from its argument list instead. You can use the same names, but since each scope has it's own lexical variables, they don't interfere with each other:
while( ... ) {
my $foo = ...;
do_something($foo);
...
}
sub do_something {
my( $foo ) = #_;
}
See also:
How to share/export a global variable between two different perl scripts?
You say you aren't doing what I'm about to explain, but other people may want to do something similar to share values. Since you are sharing the same variable names across programs, I suspect that this is actually what it going on, though.
In that case, there are many modules on CPAN that can do that job. What you choose depends on what sort of stuff you are trying to share between programs. I have a chapter in Mastering Perl all about it.
You might be able to get away with something like this, where one module defines all the values and makes them available for export:
# in Local/Config.pm
package Local::Config;
use Exporter qw(import);
our #EXPORT = qw( $foo $bar );
our $foo = 'Some value';
our $bar = 'Different value';
1;
To use this, merely load it with use. It will automatically import the variables that you put in #EXPORT:
# in some program
use Local::Config;
We cover lots of this sort of stuff in Intermediate Perl.
What you want to do here is a form of boilerplate management. Shoving variable declarations into a module or class file. This is a laudable goal. In fact you should shove as much boilerplate into that other module as possible. It makes it far easier to keep consistent behavior across the many scripts in a project. However shoving variables in there will not be as easy as you think.
First of all, $a and $b are special variables reserved for use in sort blocks so they never have to be declared. So using them here will not validate your test. require always searches for the file in #INC. See perlfunc require.
To declare a variable it has to be done at compile time. our, my, and state all operate at compile time and legalize a symbol in a lexical scope. Since a module is a scope, and require and do both create a scope for that file, there is no way to have our (let alone my and state) reach back to a parent scope to declare a symbol.
This leaves you with two options. Export package globals back to the calling script or munge the script with a source filter. Both of these will give you heartburn. Remember that it has to be done at compile time.
In the interest of computer science, here's how you would do it (but don't do it).
#boilerplate.pm
use strict;
use vars qw/$foo $bar/;
1;
__END__
#script.pl
use strict;
use boilerplate;
$foo = "foo here";
use vars is how you declare package globals when strict is in effect. Package globals are unscoped ("global") so it doesn't matter what scope or file they're declared in. (NB: our does not create a global like my creates a lexical. our creates a lexical alias to a global, thus exposing whatever is there.) Notice that boilerplate.pm has no package declaration. It will inherit whatever called it which is what you want.
The second way using source filters is devious. You create a module that rewrites the source code of your script on the fly. See Filter::Simple and perlfilter for more information. This only works on real scripts, not perl -e ....
#boilerplate.pm
package boilerplate;
use strict; use diagnostics;
use Filter::Simple;
my $injection = '
our ($foo, $bar);
my ($baz);
';
FILTER { s/__FILTER__/$injection/; }
__END__
#script.pl
use strict; use diagnostics;
use boilerplate;
__FILTER__
$foo = "foo here";
You can make any number of filtering tokens or scenarios for code substitution. e.g. use boilerplate qw/D2_loadout/;
These are the only ways to do it with standard Perl. There are modules that let you meddle with calling scopes through various B modules but you're on your own there. Thanks for the question!
HTH

Why declare Perl variable with "my" at file scope?

I'm learning Perl and trying to understand variable scope. I understand that my $name = 'Bob'; will declare a local variable inside a sub, but why would you use the my keyword at the global scope? Is it just a good habit so you can safely move the code into a sub?
I see lots of example scripts that do this, and I wonder why. Even with use strict, it doesn't complain when I remove the my. I've tried comparing behaviour with and without it, and I can't see any difference.
Here's one example that does this:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbfile = "sample.db";
my $dsn = "dbi:SQLite:dbname=$dbfile";
my $user = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password, {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
FetchHashKeyName => 'NAME_lc',
});
# ...
$dbh->disconnect;
Update
It seems I was unlucky when I tested this behaviour. Here's the script I tested with:
use strict;
my $a = 5;
$b = 6;
sub print_stuff() {
print $a, $b, "\n"; # prints 56
$a = 55;
$b = 66;
}
print_stuff();
print $a, $b, "\n"; # prints 5566
As I learned from some of the answers here, $a and $b are special variables that are already declared, so the compiler doesn't complain. If I change the $b to $c in that script, then it complains.
As for why to use my $foo at the global scope, it seems like the file scope may not actually be the global scope.
The addition of my was about the best thing that ever happened to Perl and the problem it solved was typos.
Say you have a variable $variable. You do some assignments and comparisons on this variable.
$variable = 5;
# intervening assignments and calculations...
if ( $varable + 20 > 25 ) # don't use magic numbers in real code
{
# do one thing
}
else
{
# do something else
}
Do you see the subtle bug in the above code that happens if you don't use strict; and require variables be declared with my? The # do one thing case will never happen. I encountered this several times in production code I had to maintain.
A few points:
strict demands that all variables be declared with a my (or state) or installed into the package--declared with an our statement or a use vars pragma (archaic), or inserted into the symbol table at compile time.
They are that file's variables. They remain of no concern and no use to any module required during the use of that file.
They can be used across packages (although that's a less good reason.)
Lexical variables don't have any of the magic that the only alternative does. You can't "push" and "pop" a lexical variable as you change scope, as you can with any package variable. No magic means faster and plainer handling.
Laziness. It's just easier to declare a my with no brackets as opposed to concentrating its scope by specific bracketing.
{ my $visible_in_this_scope_only;
...
sub bananas {
...
my $bananas = $visible_in_this_scope_only + 3;
...
}
} # End $visible_in_this_scope_only
(Note on the syntax: in my code, I never use a bare brace. It will always tell you, either before (standard loops) or after what the scope is for, even if it would have been "obvious".
It's just good practice. As a personal rule, I try to keep variables in the smallest scope possible. If a line of code can't see a variable, then it can't mess with it in unexpected ways.
I'm surprised that you found that the script worked under use strict without the my, though. That's generally not allowed:
$ perl -E 'use strict; $db = "foo"; say $db'
Global symbol "$db" requires explicit package name at -e line 1.
Global symbol "$db" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
$ perl -E 'use strict; my $db = "foo"; say $db'
foo
Variables $a and $b are exempt:
$ perl -E 'use strict; $b = "foo"; say $b'
foo
But I don't know how you would make the code you posted work with strict and a missing my.
A sub controls/limits the scope of variables between the braces {} that define its operations. Of course many variables exist outside of a particular function and using lexical my for "global" variables can give you more control over how "dynamic" their behavior is inside your application. The Private Variables via my() section of perlodocperlsub discusses reasons for doing this pretty thoroughly.
I'm going to quote myself from elsewhere which is not the best thing to do on SO but here goes:
The classic perlmonks node - Variable Scoping in Perl: the
basics - is a frequently
consulted reference :-)
As I noted in a comment, Bruce Gray's talk at YAPC::NA 2012 - The why of my() is a good story about how a pretty expert perl programmer wrapped his head around perl and namespaces.
I've heard people explain my as Perl's equivalent to Javascript's var - it's practically necessary but, Perl being perl, things will work without it if you insist or take pains to make it do that.
ps: Actually with Javascript, I guess functions are used to control "scope" in a way that is analagous to your description of using my in sub's.

Why can't localize lexical variable in Perl?

I have below Perl code.
use warnings;
use strict;
my $x = "global\n";
sub a {
print $x;
}
sub b {
local $x = "local\n";
a();
}
a();
b();
a();
Even if $x has scope inside b() subroutine why Perl doesn't allows to localize it ?
You cannot mix the lexical scope used by my with the namespaced global scope of package variables (the keyword local may only be used on the latter). Perl will treat $x in the source as the lexically-scoped variable once you have defined it as such. You can still access the package variable (using $::x) - although that would just mean you had two entirely separate variables in use, and would not allow you to refer to either at the same time as $x.
You can achieve something very similar to what you appear to be trying to do by using our instead of my:
our $x = "global\n";
The our keyword creates a lexically-scoped alias to a package variable.
Output is then:
global
local
global
I just wanted to know what is the motivation behind this constraints.
my is thought of as a creator of statically scoped variables.
local is thought of as a creator of dynamically scoped variables.
So you have two similarly-named variables in your program. Given the whole point of my was to replace local, of course my takes precedence over local and not the other way around. You would lose the benefits of my if it was the other way around.

What is meant by proper closure here

This is a code lifted straight from Perl Cookbook:
#colors = qw(red blue green yellow orange purple violet);
for my $name (#colors) {
no strict 'refs';
*$name = sub { "<FONT COLOR='$name'>#_</FONT>" };
}
It's intention is to form 6 different subroutines with names of different colors. In the explanation part, the book reads:
These functions all seem independent, but the real code was in fact only compiled once. This technique
saves on both compile time and memory use. To create a proper closure, any variables in the anonymous
subroutine must be lexicals. That's the reason for the my on the loop iteration variable.
What is meant by proper closure, and what will happen if the my is omitted? Plus how come a typeglob is working with a lexical variable, even though typeglobs cannot be defined for lexical variables and should throw error?
As others have mentioned the cookbook is using the term "proper" to refer to the fact that a subroutine is created that carries with it a variable that is from a higher lexical scope and that this variable can no longer be reached by any other means. I use the over simplified mnemonic "Access to the $color variable is 'closed'" to remember this part of closures.
The statement "typeglobs cannot be defined for lexical variables" misunderstands a few key points about typeglobs. It is somewhat true it you read it as "you cannot use 'my' to create a typeglob". Consider the following:
my *red = sub { 'this is red' };
This will die with "syntax error near "my *red" because its trying to define a typeglob using the "my" keyword.
However, the code from your example is not trying to do this. It is defining a typeglob which is global unless overridden. It is using the value of a lexical variable to define the name of the typeglob.
Incidentally a typeglob can be lexically local. Consider the following:
my $color = 'red';
# create sub with the name "main::$color". Specifically "main:red"
*$color = sub { $color };
# preserve the sub we just created by storing a hard reference to it.
my $global_sub = \&$color;
{
# create a lexically local sub with the name "main::$color".
# this overrides "main::red" until this block ends
local *$color = sub { "local $color" };
# use our local version via a symbolic reference.
# perl uses the value of the variable to find a
# subroutine by name ("red") and executes it
print &$color(), "\n";
# use the global version in this scope via hard reference.
# perl executes the value of the variable which is a CODE
# reference.
print &$global_sub(), "\n";
# at the end of this block "main::red" goes back to being what
# it was before we overrode it.
}
# use the global version by symbolic reference
print &$color(), "\n";
This is legal and the output will be
local red
red
red
Under warnings this will complain "Subroutine main::red redefined"
I believe that "proper closure" just means actually a closure. If $name is not a lexical, all the subs will refer to the same variable (whose value will have been reset to whatever value it had before the for loop, if any).
*$name is using the value of $name as the reference for the funny kind of dereferencing a * sigil does. Since $name is a string, it is a symbolic reference (hence the no strict 'refs').
From perlfaq7:
What's a closure?
Closures are documented in perlref.
Closure is a computer science term with a precise but hard-to-explain
meaning. Usually, closures are implemented in Perl as anonymous
subroutines with lasting references to lexical variables outside their
own scopes. These lexicals magically refer to the variables that were
around when the subroutine was defined (deep binding).
Closures are most often used in programming languages where you can
have the return value of a function be itself a function, as you can
in Perl. Note that some languages provide anonymous functions but are
not capable of providing proper closures: the Python language, for
example. For more information on closures, check out any textbook on
functional programming. Scheme is a language that not only supports
but encourages closures.
The answer carries on to answer the question in some considerable detail.
The Perl FAQs are a great resource. But it's a bit of a waste of effort maintaining them if no-one ever reads them.
Edit (to better answer the later questions in your post):
1/ What is meant by proper closure? - see above
2/ What will happen if the my is omitted? - It won't work. Try it and see what happens.
3/ How come a typeglob is working with a lexical variable? - The variable $name is only being used to define the name of the typeglob to work with. It's not actually being used as a typeglob.
Does that help more?

Should I call Perl subroutines with no arguments as marine() or marine?

As per my sample code below, there are two styles to call a subroutine: subname and subname().
#!C:\Perl\bin\perl.exe
use strict;
use warnings;
use 5.010;
&marine(); # style 1
&marine; # style 2
sub marine {
state $n = 0; # private, persistent variable $n
$n += 1;
print "Hello, sailor number $n!\n";
}
Which one, &marine(); or &marine;, is the better choice if there are no arguments in the call?
In Learning Perl, where this example comes from, we're at the very beginning of showing you subroutines. We only tell you to use the & so that you, as the beginning Perler, don't run into a problem where you define a subroutine with the same name as a Perl built-in then wonder why it doesn't work. The & in front always calls your defined subroutine. Beginning students often create their own subroutine log to print a message because they are used to doing that in other technologies they use. In Perl, that's the math function builtin.
After you get used to using Perl and you know about the Perl built-ins (scan through perlfunc), drop the &. There's some special magic with & that you hardly ever need:
marine();
You can leave off the () if you've pre-declared the subroutine, but I normally leave the () there even for an empty argument list. It's a bit more robust since you're giving Perl the hint that the marine is a subroutine name. To me, I recognize that more quickly as a subroutine.
The side effect of using & without parentheses is that the subroutine is invoked with #_. This program
sub g {
print "g: #_\n";
}
sub f {
&g(); # g()
&g; # g(#_)
g(); # g()
g; # g()
}
f(1,2,3);
produces this output:
g:
g: 1 2 3
g:
g:
It's good style to declare your subroutines first with the sub keyword, then call them. (Of course there are ways around it, but why make things more complicated than necessary?)
Do not use the & syntax unless you know what it does exactly to #_ and subroutines declared with prototypes. It is terribly obscure, rarely needed and a source of bugs through unintended behaviour. Just leave it away – Perl::Critic aptly says about it:
Since Perl 5, the ampersand sigil is completely optional when invoking subroutines.
Now, given following these style hints, I prefer to call subroutines that require no parameters in style 1, that is to say marine();. The reasons are
visual consistency with subroutines that do require parameters
it cannot be confused with a different keyword.
As a general rule I recommend the following:
Unless you need the & because you're over riding a built in function or you have no parameter list omit it.
Always include the () as in marine().
I do both of these for code readability. The first rule makes it clear when I'm overriding internal Perl functions by making them distinct. The second makes it clear when I'm invoking functions.
perl allows you to omit parenthesis in your function call.
So you can call your function with arguments in two different ways:
your_function( arg1,arg2,arg3);
or
your function arg1,arg2,arg3 ;
Its a matter of choice that which form do you prefer. With users from C background the former is more intuitive.
I personally use the former for functions defined by me and latter for built in functions like:
print "something" instead of print("something")