Undocumented Perl variable %_? - perl

I recently discovered what seems to be an undocumented variable in Perl, %_. I don't remember exactly how I stumbled across it (it was last week), but I had a typo in my code where I was using map and instead of $_->{key} I used $_{key}. When I found the mistake, I was surprised that it didn't generate an error, and I verified that use strict and use warnings were in place.
So, I did up a small test, and sure enough it runs without any warnings or errors:
$ perl
use strict;
use warnings;
print keys %_;
$
So, all I can figure is that %_ is defined somewhere. I can't find it in perlvar, so what's the deal? It doesn't have any contents in the script above.

Punctuation variables are exempt from strict. That's why you don't have to use something like our $_; before using $_. From perlvar,
Perl identifiers that begin with digits, control characters, or punctuation characters [...] are also exempt from strict 'vars' errors.
%_ isn't undocumented. From perlvar,
Perl variable names may also be a sequence of digits or a single punctuation or control character (with the literal control character form deprecated). These names are all reserved for special uses by Perl
You can have a hash named _ because _ is a valid name for a variable. (I'm sure you are familiar with $_ and #_.)
No Perl builtin currently sets it or reads %_ implicitly, but punctuation variables such as %_ are reserved.
Note that punctuation variables are also special in that they are "super globals". This means that unqualified %_ refers to %_ in the root package, not %_ in the current package.
$ perl -E'
%::x = ( "%::x" => 1 );
%::_ = ( "%::_" => 1 );
%Foo::x = ( "%Foo::x" => 1 );
%Foo::_ = ( "%Foo::_" => 1 );
package Foo;
say "%x = ", keys(%x);
say "%_ = ", keys(%_);
say "%::x = ", keys(%::x);
say "%::_ = ", keys(%::_);
say "%Foo::x = ", keys(%Foo::x);
say "%Foo::_ = ", keys(%Foo::_);
'
%x = %Foo::x
%_ = %::_ <-- surprise!
%::x = %::x
%::_ = %::_
%Foo::x = %Foo::x
%Foo::_ = %Foo::_
This means that forgetting to use local %_ (as you did) can have very far-reaching effects.

It's not undocumented, it's just unused. You'll find it's always empty
perldoc perlvar says this
Perl variable names may also be a sequence of digits or a single punctuation or control character ... These names are all reserved for special uses by Perl; for example, the all-digits names are used to hold data captured by backreferences after a regular expression match.
So %_ is reserved but unused.
Hash variables are the least common, so you will find that you can use %1, %( etc. as well (code like $({xx} = 99 is fine) but you will get no warning because of backward-compatability issues
Valid general-purpose variable names must start with a letter (with the utf8 pragma in place that may be any character with the Unicode letter property) or an ASCII underscore, when it must be followed by at least one other character

$_ is a global variable. Global variables live in symbol tables, and the built-in punctuation variables all live in the symbol table for package main.
You can see the contents of the symbol table for main like this:
$ perl -MData::Dumper -e'print Dumper \%main::' # or \%:: for short
$VAR1 = {
'/' => *{'::/'},
',' => *{'::,'},
'"' => *{'::"'},
'_' => *::_,
# and so on
};
All of the above entries are typeglobs, indicated by the * sigil. A typeglob is like a container with slots for all of the different Perl types (e.g. SCALAR, ARRAY, HASH, CODE).
A typeglob allows you to use different variables with the same identifier (the name after the sigil):
${ *main::foo{SCALAR} } # long way of writing $main::foo
#{ *main::foo{ARRAY} } # long way of writing #main::foo
%{ *main::foo{HASH} } # long way of writing %main::foo
The values of $_, #_, and %_ are all stored in the main symbol table entry with key _. When you access %_, you're actually accessing the HASH slot in the *main::_ typeglob (*::_ for short).
strict 'vars' will normally complain if you try to access a global variable without the fully-qualified name, but punctuation variables are exempt:
use strict;
*main::foo = \'bar'; # assign 'bar' to SCALAR slot
print $main::foo; # prints 'bar'
print $foo; # error: Variable "$foo" is not imported
# Global symbol "$foo" requires explicit package name
print %_; # no error

Related

Seeing value of Perl variable created at runtime

My code is below:
use strict;
my $store = 'Media Markt';
my $sentence = "I visited [store]";
# Replace characters "[" and "]"
$sentence =~ s/\[/\$/g;
$sentence =~ s/\]//g;
print $sentence;
I see following at screen:
I visited $store
Is it possible to see following? I want to see value of $store:
I visited Media Markt
You seem to be thinking of using a string, 'store', in order to build a variable name, $store. This gets to the subject of symbolic references, and you do not want to go there.
One way to do what you want is to build a hash that relates such strings to corresponding variables. Then capture the bracketed strings in the sentence and replace them by their hash values
use warnings;
use strict;
my $store = 'Media Markt';
my $time = 'morning';
my %repl = ( store => $store, time => $time );
my $sentence = "I visited [store] in the [time]";
$sentence =~ s/\[ ([^]]+) \]/$repl{$1}/gex;
print "$sentence\n";
This prints the line I visited Media Markt in the morning
The regex captures anything between [ ], by using the negated character class [^]] (any char other than ]), matched one-or-more times (+). Then it replaces that with its value in the hash, using /e to evaluate the replacement side as an expression. Since brackets are matched as well they end up being removed. The /x allows spaces inside, for readibilty.
For each string found in brackets there must be a key-value pair in the hash or you'll get a warning. To account for this, we can provide an alternative
$sentence =~ s{\[ ([^]+) \]}{$repl{$1}//"[$1]"}gex;
The defined-or operator (//) puts back "[$1]" if $repl{$1} returns undef (no key $1 in the hash, or it has undef value). Thus strings which have no hash pairs are unchanged. I changed the delimiters to s{}{} so that // can be used inside.
This does not allow nesting (like [store [name]]), does not handle multiline strings, and has other limitations. But it should work for reasonable cases.
As I told you on the Perl Programmers Facebook group, this is very similar to one of the answers in the Perl FAQ.
How can I expand variables in text strings?
If you can avoid it, don't, or if you can use a templating system, such as Text::Template or Template Toolkit, do that instead. You might even be able to get the job done with sprintf or printf:
my $string = sprintf 'Say hello to %s and %s', $foo, $bar;
However, for the one-off simple case where I don't want to pull out a full templating system, I'll use a string that has two Perl scalar variables in it. In this example, I want to expand $foo and $bar to their variable's values:
my $foo = 'Fred';
my $bar = 'Barney';
$string = 'Say hello to $foo and $bar';
One way I can do this involves the substitution operator and a double /e flag. The first /e evaluates $1 on the replacement side and turns it into $foo. The second /e starts with $foo and replaces it with its value. $foo, then, turns into 'Fred', and that's finally what's left in the string:
$string =~ s/(\$\w+)/$1/eeg; # 'Say hello to Fred and Barney'
The /e will also silently ignore violations of strict, replacing undefined variable names with the empty string. Since I'm using the /e flag (twice even!), I have all of the same security problems I have with eval in its string form. If there's something odd in $foo, perhaps something like #{[ system "rm -rf /" ]}, then I could get myself in trouble.
To get around the security problem, I could also pull the values from a hash instead of evaluating variable names. Using a single /e, I can check the hash to ensure the value exists, and if it doesn't, I can replace the missing value with a marker, in this case ??? to signal that I missed something:
my $string = 'This has $foo and $bar';
my %Replacements = (
foo => 'Fred',
);
# $string =~ s/\$(\w+)/$Replacements{$1}/g;
$string =~ s/\$(\w+)/
exists $Replacements{$1} ? $Replacements{$1} : '???'
/eg;
print $string;
And the actual (but really not recommended - for the reasons explained in the FAQ above) answer to your question is:
$sentence =~ s/\[(\w+)]/'$' . $1/ee;

What is "%_" in perl?

I've just been given a code snippet:
#list = grep { !$_{$_}++ } #list;
As an idiom for deduplication. It seems to work, but - there's no %_ listed in perlvar.
I'd normally be writing the above by declaring %seen e.g.:
my %seen; my #list = grep { not $seen{$_}++ } #list;
But %_ seems to work, although it seems to be global scope. Can anyone point me to a reference for it? (Or at least reassure me that doing the above isn't smashing something important!)
It's a hash. You can have a hash named _ because _ is a valid name for a variable. (I'm sure you are familiar with $_ and #_.)
No Perl builtin currently sets it or reads %_ implicitly, but punctuation variables such as %_ are reserved.
Perl variable names may also be a sequence of digits or a single punctuation or control character (with the literal control character form deprecated). These names are all reserved for special uses by Perl
Note that punctuation variables are also special in that they are "super globals". This means that unqualified %_ refers to %_ in the root package, not %_ in the current package.
$ perl -E'
%::x = ( name => "%::x" );
%::_ = ( name => "%::_" );
%Foo::x = ( name => "%Foo::x" );
%Foo::_ = ( name => "%Foo::_" );
package Foo;
say "%::x = $::x{name}";
say "%::_ = $::_{name}";
say "%Foo::x = $Foo::x{name}";
say "%Foo::_ = $Foo::_{name}";
say "%x = $x{name}";
say "%_ = $_{name}";
'
%::x = %::x
%::_ = %::_
%Foo::x = %Foo::x
%Foo::_ = %Foo::_
%x = %Foo::x
%_ = %::_ <-- surprise!
This means that forgetting to use local %_ (as you did) can have very far-reaching effects.

Perl dereferencing in non-strict mode

In Perl, if I have:
no strict;
#ARY = (58, 90);
To operate on an element of the array, say it, the 2nd one, I would write (possibly as part of a larger expression):
$ARY[1] # The most common way found in Perldoc's idioms.
Though, for some reason these also work:
#ARY[1]
#{ARY[1]}
Resulting all in the same object:
print (\$ARY[1]);
print (\#ARY[1]);
print (\#{ARY[1]});
Output:
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
SCALAR(0x9dbcdc)
What is the syntax rules that enable this sort of constructs? How far could one devise reliable program code with each of these constructs, or with a mix of all of them either? How interchangeable are these expressions? (always speaking in a non-strict context).
On a concern of justifying how I come into this question, I agree "use strict" as a better practice, still I'm interested at some knowledge on build-up non-strict expressions.
In an attemp to find myself some help to this uneasiness, I came to:
The notion on "no strict;" of not complaining about undeclared
variables and quirk syntax.
The prefix dereference having higher precedence than subindex [] (perldsc § "Caveat on precedence").
The clarification on when to use # instead of $ (perldata § "Slices").
The lack of "[]" (array subscript / slice) description among the Perl's operators (perlop), which lead me to think it is not an
operator... (yet it has to be something else. But, what?).
For what I learned, none of these hints, put together, make me better understand my issue.
Thanks in advance.
Quotation from perlfaq4:
What is the difference between $array[1] and #array[1]?
The difference is the sigil, that special character in front of the array name. The $ sigil means "exactly one item", while the # sigil means "zero or more items". The $ gets you a single scalar, while the # gets you a list.
Please see: What is the difference between $array[1] and #array[1]?
#ARY[1] is indeed a slice, in fact a slice of only one member. The difference is it creates a list context:
#ar1[0] = qw( a b c ); # List context.
$ar2[0] = qw( a b c ); # Scalar context, the last value is returned.
print "<#ar1> <#ar2>\n";
Output:
<a> <c>
Besides using strict, turn warnings on, too. You'll get the following warning:
Scalar value #ar1[0] better written as $ar1[0]
In perlop, you can read that "Perl's prefix dereferencing operators are typed: $, #, %, and &." The standard syntax is SIGIL { ... }, but in the simple cases, the curly braces can be omitted.
See Can you use string as a HASH ref while "strict refs" in use? for some fun with no strict refs and its emulation under strict.
Extending choroba's answer, to check a particular context, you can use wantarray
sub context { return wantarray ? "LIST" : "SCALAR" }
print $ary1[0] = context(), "\n";
print #ary1[0] = context(), "\n";
Outputs:
SCALAR
LIST
Nothing you did requires no strict; other than to hide your error of doing
#ARY = (58, 90);
when you should have done
my #ARY = (58, 90);
The following returns a single element of the array. Since EXPR is to return a single index, it is evaluated in scalar context.
$array[EXPR]
e.g.
my #array = qw( a b c d );
my $index = 2;
my $ele = $array[$index]; # my $ele = 'c';
The following returns the elements identified by LIST. Since LIST is to return 0 or more elements, it must be evaluated in list context.
#array[LIST]
e.g.
my #array = qw( a b c d );
my #indexes ( 1, 2 );
my #slice = $array[#indexes]; # my #slice = qw( b c );
\( $ARY[$index] ) # Returns a ref to the element returned by $ARY[$index]
\( #ARY[#indexes] ) # Returns refs to each element returned by #ARY[#indexes]
${foo} # Weird way of writing $foo. Useful in literals, e.g. "${foo}bar"
#{foo} # Weird way of writing #foo. Useful in literals, e.g. "#{foo}bar"
${foo}[...] # Weird way of writing $foo[...].
Most people don't even know you can use these outside of string literals.

2 Sub references as arguments in perl

I have perl function I dont what does it do?
my what does min in perl?
#ARVG what does mean?
sub getArgs
{
my $argCnt=0;
my %argH;
for my $arg (#ARGV)
{
if ($arg =~ /^-/) # insert this entry and the next in the hash table
{
$argH{$ARGV[$argCnt]} = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;}
Code like that makes David sad...
Here's a reformatted version of the code doing the indentations correctly. That makes it so much easier to read. I can easily tell where my if and loops start and end:
sub getArgs {
my $argCnt = 0;
my %argH;
for my $arg ( #ARGV ) {
if ( $arg =~ /^-/ ) { # insert this entry and the next in the hash table
$argH{ $ARGV[$argCnt] } = $ARGV[$argCnt+1];
}
$argCnt++;
}
return %argH;
}
The #ARGV is what is passed to the program. It is an array of all the arguments passed. For example, I have a program foo.pl, and I call it like this:
foo.pl one two three four five
In this case, $ARGV is set to the list of values ("one", "two", "three", "four", "five"). The name comes from a similar variable found in the C programming language.
The author is attempting to parse these arguments. For example:
foo.pl -this that -the other
would result in:
$arg{"-this"} = "that";
$arg{"-the"} = "other";
I don't see min. Do you mean my?
This is a wee bit of a complex discussion which would normally involve package variables vs. lexically scoped variables, and how Perl stores variables. To make things easier, I'm going to give you a sort-of incorrect, but technically wrong answer: If you use the (strict) pragma, and you should, you have to declare your variables with my before they can be used. For example, here's a simple two line program that's wrong. Can you see the error?
$name = "Bob";
print "Hello $Name, how are you?\n";
Note that when I set $name to "Bob", $name is with a lowercase n. But, I used $Name (upper case N) in my print statement. As it stands, now. Perl will print out "Hello, how are you?" without a care that I've used the wrong variable name. If it's hard to spot an error like this in a two line program, imagine what it would be like in a 1000 line program.
By using strict and forcing me to declare variables with my, Perl can catch that error:
use strict;
use warnings; # Another Pragma that should always be used
my $name = "Bob";
print "Hello $Name, how are you doing\n";
Now, when I run the program, I get the following error:
Global symbol "$Name" requires explicit package name at (line # of print statement)
This means that $Name isn't defined, and Perl points to where that error is.
When you define variables like this, they are in scope with in the block where it's defined. A block could be the code contained in a set of curly braces or a while, if, or for statement. If you define a variable with my outside of these, it's defined to the end of the file.
Thus, by using my, the variables are only defined inside this subroutine. And, the $arg variable is only defined in the for loop.
One more thing:
The person who wrote this should have used the Getopt::Long module. There's a major bug in their code:
For example:
foo.pl -this that -one -two
In this case, my hash looks like this:
$args{'-this'} = "that";
$args{'-one'} = "-two";
$args{'-two'} = undef;
If I did this:
if ( defined $args{'-two'} ) {
...
}
I would not execute the if statement.
Also:
foo.pl -this=that -one -two
would also fail.
#ARGV is a special variable (refer to perldoc perlvar):
#ARGV
The array #ARGV contains the command-line arguments intended for the
script. $#ARGV is generally the number of arguments minus one, because
$ARGV[0] is the first argument, not the program's command name itself.
See $0 for the command name.
Perl documentation is also available from your command line:
perldoc -v #ARGV

Recommended method to refactor variable name in Perl code?

I can use Perltidy to reformat source. Quite useful.
If a source file uses a variable like #m, how can I most easily refactor that into something else, e.g. #miles_travelled?
Using a regular expression to rename does not appear safe, because a separate variable such as $m may also exist (with a different type, in this case a scalar), yet the #m variable can be referenced using an expression like $m[$i].
For example, none of the following will be correct for Perl code:
s/([\$\#])m/$1miles_travelled/g # Will rename scalar with same name
s/\$m/\$miles_travelled/g # Will fail to rename accesses of array
Is there a recommended tool or method for safely renaming a variable name in Perl code?
The variable $m always occurs as $m.
The variable #m always occurs as #m or $m[...].
The variable %m always occurs as %m or $m{...} or #m{...}.
… except with indirect method calls: new $m[...] parses as $m->new([...]). But we can probably ignore this case (use no indirect to make sure).
If we want to cover the first three cases properly, we can
replace a scalar by s/(?<=\$)OLDNAME(?!\s*[\[\{])/NEWNAME/g
replace an array by s/(?<=\#)OLDNAME(?!\{)|(?<=\$)OLDNAME(?=\s*\[)/NEWNAME/g
replace a hash by s/(?<=\%)OLDNAME|(?<=[\$\#])OLDNAME(?=\s*\{)/NEWNAME/g
Note that lookarounds or multiple passes for the different cases are neccessary.
Test:
use Test::More tests => 3;
my $scalar_re = qr/(?<=\$) foo (?!\s*[\[\{])/x;
my $array_re = qr/(?<=\#) foo (?!\{) | (?<=\$) foo (?=\s*\[)/x;
my $hash_re = qr/(?<=\%) foo | (?<=[\$\#]) foo (?=\s*\{)/x;
my $input = '$foo, $foo[1], #foo, $foo{a}, %foo, #foo{qw/a b/}';
my $scalar = '$bar, $foo[1], #foo, $foo{a}, %foo, #foo{qw/a b/}';
my $array = '$foo, $bar[1], #bar, $foo{a}, %foo, #foo{qw/a b/}';
my $hash = '$foo, $foo[1], #foo, $bar{a}, %bar, #bar{qw/a b/}';
is $input =~ s/$scalar_re/bar/xrg, $scalar;
is $input =~ s/$array_re /bar/xrg, $array;
is $input =~ s/$hash_re /bar/xrg, $hash;
The Padre editor will carry out a small number of simple refactorings automatically for you. "Rename variable" is one of them.