What is wrong in "my $foo = $x if $y" syntax? - perl

In my last question here, #amon gave an great answer. However, he told too:
First of all, please don't do my $foo = $x if $y. You get unexpected
and undefined behavior, so it is best to avoid that syntax.
Because the above construction I was see in really many sources in the CPAN, I'm wondering how, when, where can be it wrong. (Some example code would be nice). Wondering too, why perl allows it, if it is bad.

His wording was actually a bit laxer. That wording is actually mine. Let's start with the documentation: (Emphasis in original)
NOTE: The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ...) is undefined. The value of the my variable may be undef, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons.
To be more precise, the problem is using a lexical variable when its my may not have been executed.
Consider:
# Usage:
# f($x) # Store a value
# f() # Fetch and clear the stored value
sub f {
my $x if !#_;
if (#_) {
$x = $_[0];
} else {
return $x;
}
}
f('abc');
say "<", f(), ">" # abc
This is obviously not the documented behaviour of my.
Because the above construction I was see in really many sources in the CPAN
That code is buggy. If you want a value to persist between calls to a sub, you can use a state variable since Perl 5.10, or a variable outside of the sub.

Related

Why does the Dumper module place parens around the assignment of its settings hash?

Having read this question and this question on the differences between my $var and my ($var), I was still unable to see why the Data::Dumper module uses parens in the following excerpt from its code. None of the differences described in the answers to those questions seem to apply here.
my($s) = {
level => 0, # current recursive depth
indent => $Indent, # various styles of indenting
# a bunch of other settings removed for brevity's sake
deparse => $Deparse, # use B::Deparse for coderefs
noseen => $Sparseseen, # do not populate the seen hash unless necessary
};
I tested it out in a small script and I can't perceive any difference between declaring this as my ($s) or as my $s. In both cases it is a scalar reference to a hash, so far as I can tell.
Am I missing something?
I agree, it is weird. In general parentheses on the left side of an assignment force the right side to be evaluated in list context instead of scalar context. But that accomplishes nothing in the above example.
It does, however, seem to be consistent with many other needless uses of my(...) = ... in Data::Dumper, such as this one:
sub Reset {
my($s) = shift;
$s->{seen} = {};
return $s;
}
Still, it's not consistent, as you also find plenty of examples where it's not used, such as:
my $ref = \$_[1];
my $v;
Maybe it's an occasional personal preference of the author, or else they planned for multiple assignment and never cleaned up their code after... Or maybe multiple authors with different preferences who hesitated to step on each others' toes, or fix what they figured wasn't broke. But that's just speculation...
There's no reason for my ($s) to be used here instead of my $s. I can't even fathom a stylistic reason. It is indeed weird.

redefining or overloading backticks

I have a lot of legacy code which shells out a lot, what i want to do is add a require or minimal code changes to make the backticks do something different, for instance print instead of running code
i tried using use subs but i couldn't get it to take over backticks or qx (i did redefine system which is one less thing to worry about)
i also tried to make a package:
package thingmbob;
use Data::Dumper;
use overload '``' => sub { CORE::print "things!:\t", Dumper \#_};
#this works for some reason
$thingmbob::{'(``'}('ls');
#this does the standard backtick operation
`ls`
unfourtunatly, I have no experience in OOP perl and my google-fu skills are failing me, could some one point me in the right direction?
caveat:
I'm in a closed system with a few cpan modules preinstalled, odds are that i don't have any fancy modules preinstalled and i absolutely cannot get new ones
I'm on perl5.14
edit:
for the sake of completeness i want to add my (mostly) final product
BEGIN {
*CORE::GLOBAL::readpipe = sub {
print Dumper(\#_);
#internal = readpipe(#_);
if(wantarray){
return #internal;
}else{
return join('',#internal);
}
};
}
I want it to print what its about to run and then run it. the wantarray is important because without it scalar context does not work
This perlmonks article explains how to do it. You can overwrite the readpipe built-in.
EXPR is executed as a system command. The collected standard output of the command is returned. In scalar context, it comes back as a single (potentially multi-line) string. In list context, returns a list of lines (however you've defined lines with $/ (or $INPUT_RECORD_SEPARATOR in English)). This is the internal function implementing the qx/EXPR/ operator, but you can use it directly. The qx/EXPR/ operator is discussed in more detail in I/O Operators in perlop. If EXPR is omitted, uses $_ .
You need to put this into a BEGIN block, so it would make sense to not require, but use it instead to make it available as early as possible.
Built-ins are overridden using the CORE::GLOBAL:: namespace.
BEGIN {
*CORE::GLOBAL::readpipe = sub {
print "#_";
}
}
print qx/ls/;
print `ls`;
This outputs:
ls1ls1
Where the ls is the #_ and the 1 is the return value of print inside the overridden sub.
Alternatively, there is ex::override, which does the same under the hood, but with less weird internals.

Why declare Perl variable with "my" at file scope?

I'm learning Perl and trying to understand variable scope. I understand that my $name = 'Bob'; will declare a local variable inside a sub, but why would you use the my keyword at the global scope? Is it just a good habit so you can safely move the code into a sub?
I see lots of example scripts that do this, and I wonder why. Even with use strict, it doesn't complain when I remove the my. I've tried comparing behaviour with and without it, and I can't see any difference.
Here's one example that does this:
#!/usr/bin/perl
use strict;
use warnings;
use DBI;
my $dbfile = "sample.db";
my $dsn = "dbi:SQLite:dbname=$dbfile";
my $user = "";
my $password = "";
my $dbh = DBI->connect($dsn, $user, $password, {
PrintError => 0,
RaiseError => 1,
AutoCommit => 1,
FetchHashKeyName => 'NAME_lc',
});
# ...
$dbh->disconnect;
Update
It seems I was unlucky when I tested this behaviour. Here's the script I tested with:
use strict;
my $a = 5;
$b = 6;
sub print_stuff() {
print $a, $b, "\n"; # prints 56
$a = 55;
$b = 66;
}
print_stuff();
print $a, $b, "\n"; # prints 5566
As I learned from some of the answers here, $a and $b are special variables that are already declared, so the compiler doesn't complain. If I change the $b to $c in that script, then it complains.
As for why to use my $foo at the global scope, it seems like the file scope may not actually be the global scope.
The addition of my was about the best thing that ever happened to Perl and the problem it solved was typos.
Say you have a variable $variable. You do some assignments and comparisons on this variable.
$variable = 5;
# intervening assignments and calculations...
if ( $varable + 20 > 25 ) # don't use magic numbers in real code
{
# do one thing
}
else
{
# do something else
}
Do you see the subtle bug in the above code that happens if you don't use strict; and require variables be declared with my? The # do one thing case will never happen. I encountered this several times in production code I had to maintain.
A few points:
strict demands that all variables be declared with a my (or state) or installed into the package--declared with an our statement or a use vars pragma (archaic), or inserted into the symbol table at compile time.
They are that file's variables. They remain of no concern and no use to any module required during the use of that file.
They can be used across packages (although that's a less good reason.)
Lexical variables don't have any of the magic that the only alternative does. You can't "push" and "pop" a lexical variable as you change scope, as you can with any package variable. No magic means faster and plainer handling.
Laziness. It's just easier to declare a my with no brackets as opposed to concentrating its scope by specific bracketing.
{ my $visible_in_this_scope_only;
...
sub bananas {
...
my $bananas = $visible_in_this_scope_only + 3;
...
}
} # End $visible_in_this_scope_only
(Note on the syntax: in my code, I never use a bare brace. It will always tell you, either before (standard loops) or after what the scope is for, even if it would have been "obvious".
It's just good practice. As a personal rule, I try to keep variables in the smallest scope possible. If a line of code can't see a variable, then it can't mess with it in unexpected ways.
I'm surprised that you found that the script worked under use strict without the my, though. That's generally not allowed:
$ perl -E 'use strict; $db = "foo"; say $db'
Global symbol "$db" requires explicit package name at -e line 1.
Global symbol "$db" requires explicit package name at -e line 1.
Execution of -e aborted due to compilation errors.
$ perl -E 'use strict; my $db = "foo"; say $db'
foo
Variables $a and $b are exempt:
$ perl -E 'use strict; $b = "foo"; say $b'
foo
But I don't know how you would make the code you posted work with strict and a missing my.
A sub controls/limits the scope of variables between the braces {} that define its operations. Of course many variables exist outside of a particular function and using lexical my for "global" variables can give you more control over how "dynamic" their behavior is inside your application. The Private Variables via my() section of perlodocperlsub discusses reasons for doing this pretty thoroughly.
I'm going to quote myself from elsewhere which is not the best thing to do on SO but here goes:
The classic perlmonks node - Variable Scoping in Perl: the
basics - is a frequently
consulted reference :-)
As I noted in a comment, Bruce Gray's talk at YAPC::NA 2012 - The why of my() is a good story about how a pretty expert perl programmer wrapped his head around perl and namespaces.
I've heard people explain my as Perl's equivalent to Javascript's var - it's practically necessary but, Perl being perl, things will work without it if you insist or take pains to make it do that.
ps: Actually with Javascript, I guess functions are used to control "scope" in a way that is analagous to your description of using my in sub's.

What is meant by proper closure here

This is a code lifted straight from Perl Cookbook:
#colors = qw(red blue green yellow orange purple violet);
for my $name (#colors) {
no strict 'refs';
*$name = sub { "<FONT COLOR='$name'>#_</FONT>" };
}
It's intention is to form 6 different subroutines with names of different colors. In the explanation part, the book reads:
These functions all seem independent, but the real code was in fact only compiled once. This technique
saves on both compile time and memory use. To create a proper closure, any variables in the anonymous
subroutine must be lexicals. That's the reason for the my on the loop iteration variable.
What is meant by proper closure, and what will happen if the my is omitted? Plus how come a typeglob is working with a lexical variable, even though typeglobs cannot be defined for lexical variables and should throw error?
As others have mentioned the cookbook is using the term "proper" to refer to the fact that a subroutine is created that carries with it a variable that is from a higher lexical scope and that this variable can no longer be reached by any other means. I use the over simplified mnemonic "Access to the $color variable is 'closed'" to remember this part of closures.
The statement "typeglobs cannot be defined for lexical variables" misunderstands a few key points about typeglobs. It is somewhat true it you read it as "you cannot use 'my' to create a typeglob". Consider the following:
my *red = sub { 'this is red' };
This will die with "syntax error near "my *red" because its trying to define a typeglob using the "my" keyword.
However, the code from your example is not trying to do this. It is defining a typeglob which is global unless overridden. It is using the value of a lexical variable to define the name of the typeglob.
Incidentally a typeglob can be lexically local. Consider the following:
my $color = 'red';
# create sub with the name "main::$color". Specifically "main:red"
*$color = sub { $color };
# preserve the sub we just created by storing a hard reference to it.
my $global_sub = \&$color;
{
# create a lexically local sub with the name "main::$color".
# this overrides "main::red" until this block ends
local *$color = sub { "local $color" };
# use our local version via a symbolic reference.
# perl uses the value of the variable to find a
# subroutine by name ("red") and executes it
print &$color(), "\n";
# use the global version in this scope via hard reference.
# perl executes the value of the variable which is a CODE
# reference.
print &$global_sub(), "\n";
# at the end of this block "main::red" goes back to being what
# it was before we overrode it.
}
# use the global version by symbolic reference
print &$color(), "\n";
This is legal and the output will be
local red
red
red
Under warnings this will complain "Subroutine main::red redefined"
I believe that "proper closure" just means actually a closure. If $name is not a lexical, all the subs will refer to the same variable (whose value will have been reset to whatever value it had before the for loop, if any).
*$name is using the value of $name as the reference for the funny kind of dereferencing a * sigil does. Since $name is a string, it is a symbolic reference (hence the no strict 'refs').
From perlfaq7:
What's a closure?
Closures are documented in perlref.
Closure is a computer science term with a precise but hard-to-explain
meaning. Usually, closures are implemented in Perl as anonymous
subroutines with lasting references to lexical variables outside their
own scopes. These lexicals magically refer to the variables that were
around when the subroutine was defined (deep binding).
Closures are most often used in programming languages where you can
have the return value of a function be itself a function, as you can
in Perl. Note that some languages provide anonymous functions but are
not capable of providing proper closures: the Python language, for
example. For more information on closures, check out any textbook on
functional programming. Scheme is a language that not only supports
but encourages closures.
The answer carries on to answer the question in some considerable detail.
The Perl FAQs are a great resource. But it's a bit of a waste of effort maintaining them if no-one ever reads them.
Edit (to better answer the later questions in your post):
1/ What is meant by proper closure? - see above
2/ What will happen if the my is omitted? - It won't work. Try it and see what happens.
3/ How come a typeglob is working with a lexical variable? - The variable $name is only being used to define the name of the typeglob to work with. It's not actually being used as a typeglob.
Does that help more?

perl encapsulate single variable in double quotes

In Perl, is there any reason to encapsulate a single variable in double quotes (no concatenation) ?
I often find this in the source of the program I am working on (writen 10 years ago by people that don't work here anymore):
my $sql_host = "something";
my $sql_user = "somethingelse";
# a few lines down
my $db = sub_for_sql_conection("$sql_host", "$sql_user", "$sql_pass", "$sql_db");
As far as I know there is no reason to do this. When I work in an old script I usualy remove the quotes so my editor colors them as variables not as strings.
I think they saw this somewhere and copied the style without understanding why it is so. Am I missing something ?
Thank you.
All this does is explicitly stringify the variables. In 99.9% of cases, it is a newbie error of some sort.
There are things that may happen as a side effect of this calling style:
my $foo = "1234";
sub bar { $_[0] =~ s/2/two/ }
print "Foo is $foo\n";
bar( "$foo" );
print "Foo is $foo\n";
bar( $foo );
print "Foo is $foo\n";
Here, stringification created a copy and passed that to the subroutine, circumventing Perl's pass by reference semantics. It's generally considered to be bad manners to munge calling variables, so you are probably okay.
You can also stringify an object or other value here. For example, undef stringifies to the empty string. Objects may specify arbitrary code to run when stringified. It is possible to have dual valued scalars that have distinct numerical and string values. This is a way to specify that you want the string form.
There is also one deep spooky thing that could be going on. If you are working with XS code that looks at the flags that are set on scalar arguments to a function, stringifying the scalar is a straight forward way to say to perl, "Make me a nice clean new string value" with only stringy flags and no numeric flags.
I am sure there are other odd exceptions to the 99.9% rule. These are a few. Before removing the quotes, take a second to check for weird crap like this. If you do happen upon a legit usage, please add a comment that identifies the quotes as a workable kludge, and give their reason for existence.
In this case the double quotes are unnecessary. Moreover, using them is inefficient as this causes the original strings to be copied.
However, sometimes you may want to use this style to "stringify" an object. For example, URI ojects support stringification:
my $uri = URI->new("http://www.perl.com");
my $str = "$uri";
I don't know why, but it's a pattern commonly used by newcomers to Perl. It's usually a waste (as it is in the snippet you posted), but I can think of two uses.
It has the effect of creating a new string with the same value as the original, and that could be useful in very rare circumstances.
In the following example, an explicit copy is done to protect $x from modification by the sub because the sub modifies its argument.
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f($x);
say $x;
'
Abc
Abc
$ perl -E'
sub f { $_[0] =~ tr/a/A/; say $_[0]; }
my $x = "abc";
f("$x");
say $x;
'
Abc
abc
By virtue of creating a copy of the string, it stringifies objects. This could be useful when dealing with code that alters its behaviour based on whether its argument is a reference or not.
In the following example, explicit stringification is done because require handles references in #INC differently than strings.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib $lib;
use DBI;
say "ok";
'
Can't locate object method "INC" via package "Path::Class::Dir" at -e line 4.
BEGIN failed--compilation aborted at -e line 4.
$ perl -MPath::Class=file -E'
BEGIN { $lib = file($0)->dir; }
use lib "$lib";
use DBI;
say "ok";
'
ok
In your case quotes are completely useless. We can even says that it is wrong because this is not idiomatic, as others wrote.
However quoting a variable may sometime be necessary: this explicitely triggers stringification of the value of the variable. Stringification may give a different result for some values if thoses values are dual vars or if they are blessed values with overloaded stringification.
Here is an example with dual vars:
use 5.010;
use strict;
use Scalar::Util 'dualvar';
my $x = dualvar 1, "2";
say 0+$x;
say 0+"$x";
Output:
1
2
My theory has always been that it's people coming over from other languages with bad habits. It's not that they're thinking "I will use double quotes all the time", but that they're just not thinking!
I'll be honest and say that I used to fall into this trap because I came to Perl from Java, so the muscle memory was there, and just kept firing.
PerlCritic finally got me out of the habit!
It definitely makes your code more efficient, but if you're not thinking about whether or not you want your strings interpolated, you are very likely to make silly mistakes, so I'd go further and say that it's dangerous.