What is "%_" in perl? - perl

I've just been given a code snippet:
#list = grep { !$_{$_}++ } #list;
As an idiom for deduplication. It seems to work, but - there's no %_ listed in perlvar.
I'd normally be writing the above by declaring %seen e.g.:
my %seen; my #list = grep { not $seen{$_}++ } #list;
But %_ seems to work, although it seems to be global scope. Can anyone point me to a reference for it? (Or at least reassure me that doing the above isn't smashing something important!)

It's a hash. You can have a hash named _ because _ is a valid name for a variable. (I'm sure you are familiar with $_ and #_.)
No Perl builtin currently sets it or reads %_ implicitly, but punctuation variables such as %_ are reserved.
Perl variable names may also be a sequence of digits or a single punctuation or control character (with the literal control character form deprecated). These names are all reserved for special uses by Perl
Note that punctuation variables are also special in that they are "super globals". This means that unqualified %_ refers to %_ in the root package, not %_ in the current package.
$ perl -E'
%::x = ( name => "%::x" );
%::_ = ( name => "%::_" );
%Foo::x = ( name => "%Foo::x" );
%Foo::_ = ( name => "%Foo::_" );
package Foo;
say "%::x = $::x{name}";
say "%::_ = $::_{name}";
say "%Foo::x = $Foo::x{name}";
say "%Foo::_ = $Foo::_{name}";
say "%x = $x{name}";
say "%_ = $_{name}";
'
%::x = %::x
%::_ = %::_
%Foo::x = %Foo::x
%Foo::_ = %Foo::_
%x = %Foo::x
%_ = %::_ <-- surprise!
This means that forgetting to use local %_ (as you did) can have very far-reaching effects.

Related

Undocumented Perl variable %_?

I recently discovered what seems to be an undocumented variable in Perl, %_. I don't remember exactly how I stumbled across it (it was last week), but I had a typo in my code where I was using map and instead of $_->{key} I used $_{key}. When I found the mistake, I was surprised that it didn't generate an error, and I verified that use strict and use warnings were in place.
So, I did up a small test, and sure enough it runs without any warnings or errors:
$ perl
use strict;
use warnings;
print keys %_;
$
So, all I can figure is that %_ is defined somewhere. I can't find it in perlvar, so what's the deal? It doesn't have any contents in the script above.
Punctuation variables are exempt from strict. That's why you don't have to use something like our $_; before using $_. From perlvar,
Perl identifiers that begin with digits, control characters, or punctuation characters [...] are also exempt from strict 'vars' errors.
%_ isn't undocumented. From perlvar,
Perl variable names may also be a sequence of digits or a single punctuation or control character (with the literal control character form deprecated). These names are all reserved for special uses by Perl
You can have a hash named _ because _ is a valid name for a variable. (I'm sure you are familiar with $_ and #_.)
No Perl builtin currently sets it or reads %_ implicitly, but punctuation variables such as %_ are reserved.
Note that punctuation variables are also special in that they are "super globals". This means that unqualified %_ refers to %_ in the root package, not %_ in the current package.
$ perl -E'
%::x = ( "%::x" => 1 );
%::_ = ( "%::_" => 1 );
%Foo::x = ( "%Foo::x" => 1 );
%Foo::_ = ( "%Foo::_" => 1 );
package Foo;
say "%x = ", keys(%x);
say "%_ = ", keys(%_);
say "%::x = ", keys(%::x);
say "%::_ = ", keys(%::_);
say "%Foo::x = ", keys(%Foo::x);
say "%Foo::_ = ", keys(%Foo::_);
'
%x = %Foo::x
%_ = %::_ <-- surprise!
%::x = %::x
%::_ = %::_
%Foo::x = %Foo::x
%Foo::_ = %Foo::_
This means that forgetting to use local %_ (as you did) can have very far-reaching effects.
It's not undocumented, it's just unused. You'll find it's always empty
perldoc perlvar says this
Perl variable names may also be a sequence of digits or a single punctuation or control character ... These names are all reserved for special uses by Perl; for example, the all-digits names are used to hold data captured by backreferences after a regular expression match.
So %_ is reserved but unused.
Hash variables are the least common, so you will find that you can use %1, %( etc. as well (code like $({xx} = 99 is fine) but you will get no warning because of backward-compatability issues
Valid general-purpose variable names must start with a letter (with the utf8 pragma in place that may be any character with the Unicode letter property) or an ASCII underscore, when it must be followed by at least one other character
$_ is a global variable. Global variables live in symbol tables, and the built-in punctuation variables all live in the symbol table for package main.
You can see the contents of the symbol table for main like this:
$ perl -MData::Dumper -e'print Dumper \%main::' # or \%:: for short
$VAR1 = {
'/' => *{'::/'},
',' => *{'::,'},
'"' => *{'::"'},
'_' => *::_,
# and so on
};
All of the above entries are typeglobs, indicated by the * sigil. A typeglob is like a container with slots for all of the different Perl types (e.g. SCALAR, ARRAY, HASH, CODE).
A typeglob allows you to use different variables with the same identifier (the name after the sigil):
${ *main::foo{SCALAR} } # long way of writing $main::foo
#{ *main::foo{ARRAY} } # long way of writing #main::foo
%{ *main::foo{HASH} } # long way of writing %main::foo
The values of $_, #_, and %_ are all stored in the main symbol table entry with key _. When you access %_, you're actually accessing the HASH slot in the *main::_ typeglob (*::_ for short).
strict 'vars' will normally complain if you try to access a global variable without the fully-qualified name, but punctuation variables are exempt:
use strict;
*main::foo = \'bar'; # assign 'bar' to SCALAR slot
print $main::foo; # prints 'bar'
print $foo; # error: Variable "$foo" is not imported
# Global symbol "$foo" requires explicit package name
print %_; # no error

What does this code do in Perl: keys(%$hash) ...?

print "Enter the hash \n";
$hash=<STDIN>;chop($hash);
#keys = keys (%$hash);
#values = values (%$hash);
Since Google ignores special characters there was no way I could find what the "%$hash" thing does and how this is suppossed to work
keys(%$hash) returns the keys of the hash referenced by the value in $hash. A hash is a type of associative array, which (more or less) means an array that's indexed by strings (called "keys") instead of by numbers.
In this particular case, $hash contains a string. When one uses a string as a reference, dereferencing it access the package variable whose name matches the string.
If the full program is
%FOO = ( a=>1, b=>2 );
%BAR = ( c=>3, d=>4 );
print "Enter the hash \n";
$hash=<STDIN>;chop($hash);
#keys = keys(%$hash);
Then,
#keys will contains a and b if the user enters FOO.
#keys will contains c and d if the user enters BAR.
#keys will contains E2BIG, EACCES, EADDRINUSE and many more if the user enters !.
#keys can contains paths if the user enters INC.
#keys will be empty for most other values.
(The keys are returned in an arbitrary order.)
The last three cases are surely unintentional. This is why the posted code is awful code. This is what the code should have been:
use strict; # Always use these as they
use warnings 'all'; # find/prevent numerous errors.
my %FOO = ( a=>1, b=>2 );
my %BAR = ( c=>3, d=>4 );
my %inputs = ( FOO => \%FOO, BAR => \%BAR );
print "Enter the name of a hash: ";
my $hash_name = <STDIN>;
chomp($hash_name);
my $hash = $inputs{$hash_name}
or die("No such hash\n");
my #keys = keys(%$hash);
...
keys() returns the keys of the specified hash. In the code you wrote, the name of the hash to look at (and extract the keys and values of) is being specified via STDIN, which is really bizarre behavior.
The code you posted is nonsensical, but what it should be doing is dereferencing a hash reference, provided that you have a valid hash reference stored in your scalar $hash (which you don't).
For example:
use strict;
use warnings;
use Data::Dump;
my $href = {
foo => 'bar',
bat => 'baz',
};
dd(keys(%$href)); # ("bat", "foo")
dd(values(%$href)); # ("baz", "bar")
The keys() function will return a list consisting of all the keys of the hash.
The returned values are copies of the original keys in the hash, so
modifying them will not affect the original hash.
The values() function does the exact same thing, except with the values of the hash (obviously).
So long as a given hash is unmodified you may rely on keys, values and
each to repeatedly return the same order as each other.
For more help with references, see perlreftut, perlref, and maybe perldsc if you're feeling adventurous.

Attempt to delete readonly key from a restricted hash, when it is not restricted

I quite often arrange my subroutine entry like this:
sub mySub {
my ($self, %opts) = #_;
lock_keys(%opts, qw(count, name));
...
my $name = delete $opts{name};
$self->SUPER::mySub(%opts);
}
to allow calling the sub using named arguments like this:
$obj->mySub(count=>1, name=>'foobar');
The lock_keys guards against calling the sub with mis-spelled argument names.
The last couple of lines are another common idiom I use, where if I am writing a method that overrides a superclass, I might extract the arguments which are specific to the subclass and then chain a call to the subclass.
This worked fine in perl 5.8, but after upgrading to Centos 6 (which has perl 5.10.1) I started to see seemingly random errors like this:
Attempt to delete readonly key 'otherOption' from a restricted hash at xxx.pl line 9.
These errors do not happen all the time (even in the same subroutine) but they do seem to relate to the call chain that results in calling the sub which bombs out.
Also note that they do not happen on perl 5.16 (or at least not on ideone).
What is causing these errors in perl 5.10? According to the manpage for Hash::Util, delete() should still work after lock_keys. It is like the whole hash is getting locked somehow.
I found the answer to this even before posting on SO, but the workaround is not great so feel free to chime in with a better one.
This SSCCE exhibits the problem:
#!/usr/bin/perl
use strict;
use Hash::Util qw(lock_keys);
sub doSomething {
my ($a, $b, %opts) = #_;
lock_keys(%opts, qw(myOption, otherOption));
my $x = delete $opts{otherOption};
}
my %h = (
a=>1,
b=>2
);
foreach my $k (keys %h) {
doSomething(1, 2, otherOption=>$k);
}
It seems that the problem is related to the values passed in as values to the named argument hash (%opt in my example). If these values are copied from keys of a hash, as in the example above, it marks the values as read-only in such a way that it later prevents deleting keys from the hash.
In fact you can see this using Devel::Peek
$ perl -e'
use Devel::Peek;
my %x=(a=>1);
foreach my $x (keys %x) {
my %y = (x => $x);
Dump($x);
Dump(\%y);
}
'
SV = PV(0x22cfb78) at 0x22d1fd0
REFCNT = 2
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
SV = RV(0x22eeb30) at 0x22eeb20
REFCNT = 1
FLAGS = (TEMP,ROK)
RV = 0x22f8880
SV = PVHV(0x22d7fb8) at 0x22f8880
REFCNT = 2
FLAGS = (PADMY,SHAREKEYS)
ARRAY = 0x22e99a0 (0:7, 1:1)
hash quality = 100.0%
KEYS = 1
FILL = 1
MAX = 7
RITER = -1
EITER = 0x0
Elt "x" HASH = 0x9303a5e5
SV = PV(0x22cfc88) at 0x22d1b98
REFCNT = 1
FLAGS = (POK,FAKE,READONLY,pPOK)
PV = 0x22f8450 "a"
CUR = 1
LEN = 0
Note that the FLAGS for the hash entry are "READONLY" and in fact the variable $x and the value of the corresponding value in %y are actually pointing at the same string (PV = 0x22f8450 in my example above). It seems that Perl 5.10 is trying hard to avoid copying strings, but in doing so has inadvertently locked the whole hash.
The workaround I am using is to force a string copy, like this:
foreach my $k (keys %h) {
my $j = "$k";
doSomething(1, 2, otherOption=>$j);
}
This seems an inefficient way to force a string copy, and in any case is easy to forget, so other answers containing better workarounds are welcome.

in Perl, how do i write regex matched string

I want to write $1 on other line for replacement;
my $converting_rules = +{
'(.+?)' => '$1',
};
my $pre = $converting_rule_key;
my $post = $converting_rules->{$converting_rule_key};
#$path_file =~ s/$pre/$post/; // Bad...
$path_file =~ s/$pre/$1/; // Good!
On Bad, $1 is recognized as a string '$1'.
But I wqnt to treat it matched string.
I have no idea what to do...plz help me!
The trouble is that s/$pre/$post/ interpolates the variables $pre and $post, but will not recursively interpolate anything in them that happens to look like a variable. So you want to add an extra eval to the replacement, with the /ee flag:
$path_file =~ s/$pre/$post/ee;
$x = '$1.00';
print qq/$x/;
prints $1.00, so it's no surprise that
$x = '$1.00';
s/(abc)/$x/;
substitutes with $1.00.
What you have there is a template, yet you did nothing to process this template. String::Interpolate can handle such templates.
use String::Interpolate qw( interpolate );
$rep = '$1';
s/$pat/ interpolate($rep) /e;

Is %$var dereferencing a Perl hash?

I'm sending a subroutine a hash, and fetching it with my($arg_ref) = #_;
But what exactly is %$arg_ref? Is %$ dereferencing the hash?
$arg_ref is a scalar since it uses the $ sigil. Presumably, it holds a hash reference. So yes, %$arg_ref deferences that hash reference. Another way to write it is %{$arg_ref}. This makes the intent of the code a bit more clear, though more verbose.
To quote from perldata(1):
Scalar values are always named with '$', even when referring
to a scalar that is part of an array or a hash. The '$'
symbol works semantically like the English word "the" in
that it indicates a single value is expected.
$days # the simple scalar value "days"
$days[28] # the 29th element of array #days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array #days
So your example would be:
%$arg_ref # hash dereferenced from the value "arg_ref"
my($arg_ref) = #_; grabs the first item in the function's argument stack and places it in a local variable called $arg_ref. The caller is responsible for passing a hash reference. A more canonical way to write that is:
my $arg_ref = shift;
To create a hash reference you could start with a hash:
some_sub(\%hash);
Or you can create it with an anonymous hash reference:
some_sub({pi => 3.14, C => 4}); # Pi is a gross approximation.
Instead of dereferencing the entire hash like that, you can grab individual items with
$arg_ref->{key}
A good brief introduction to references (creating them and using them) in Perl is perldoc perfeftut. You can also read it online (or get it as a pdf). (It talks more about references in complex data structures than in terms of passing in and out of subroutines, but the syntax is the same.)
my %hash = ( fred => 'wilma',
barney => 'betty');
my $hashref = \%hash;
my $freds_wife = $hashref->{fred};
my %hash_copy = %$hash # or %{$hash} as noted above.
Soo, what's the point of the syntax flexibility? Let's try this:
my %flintstones = ( fred => { wife => 'wilma',
kids => ['pebbles'],
pets => ['dino'],
}
barney => { # etc ... }
);
Actually for deep data structures like this it's often more convenient to start with a ref:
my $flintstones = { fred => { wife => 'Wilma',
kids => ['Pebbles'],
pets => ['Dino'],
},
};
OK, so fred gets a new pet, 'Velociraptor'
push #{$flintstones->{fred}->{pets}}, 'Velociraptor';
How many pets does Fred have?
scalar # {flintstones->{fred}->{pets} }
Let's feed them ...
for my $pet ( # {flintstones->{fred}->{pets} } ) {
feed($pet)
}
and so on. The curly-bracket soup can look a bit daunting at first, but it becomes quite easy to deal with them in the end, so long as you're consistent in the way that you deal with them.
Since it's somewhat clear this construct is being used to provide a hash reference as a list of named arguments to a sub it should also be noted that this
sub foo {
my ($arg_ref) = #_;
# do something with $arg_ref->{keys}
}
may be overkill as opposed to just unpacking #_
sub bar {
my ($a, $b, $c) = #_;
return $c / ( $a * $b );
}
Depending on how complex the argument list is.