How to clear a Perl hash - perl

Let's say we define an anonymous hash like this:
my $hash = {};
And then use the hash afterwards. Then it's time to empty or clear the hash for
reuse. After some Google searching, I found:
%{$hash} = ()
and:
undef %{$hash}
Both will serve my needs. What's the difference between the two? Are they both identical ways to empty a hash?

%$hash_ref = (); makes more sense than undef-ing the hash. Undef-ing the hash says that you're done with the hash. Assigning an empty list says you just want an empty hash.

Yes, they are absolutely identical. Both remove any existing keys and values from the table and sets the hash to the empty list.
See perldoc -f undef:
undef EXPR
undef Undefines the value of EXPR, which must be an lvalue. Use only
on a scalar value, an array (using "#"), a hash (using "%"), a
subroutine (using "&"), or a typeglob (using "*")...
Examples:
undef $foo;
undef $bar{'blurfl'}; # Compare to: delete $bar{'blurfl'};
undef #ary;
undef %hash;
However, you should not use undef to remove the value of anything except a scalar. For other variable types, set it to the "empty" version of that type -- e.g. for arrays or hashes, #foo = (); %bar = ();

Related

Does iterating over a hash reference require implicitly copying it in perl?

Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:
while(($key, $value) = each(%{$hash_ref})){
///do something
}
However, if I understand my perl correctly this is actually doing two things. First the
%{$hash_ref}
is translating the ref into list context. Thus returning something like
(key1, value1, key2, value2, key3, value3 etc)
which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.
If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?
Follow up questions, assuming I am correct about the standard behavior.
Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?
Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?
No, the syntax you quote does not create a copy.
This expression:
%{$hash_ref}
is exactly equivalent to:
%$hash_ref
and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).
If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.
If you wrote your own subroutine and passed a hash to it like this:
my_sub(%$hash_ref);
then on some level you could say that the hash had been 'copied', since inside the subroutine the special #_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of #_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my #args = #_.
Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.
As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:
($key, $value) = each(%{$hash_ref})
You can simply say:
($key, $value) = each($hash_ref)
No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.
each is a little special. It supports the following syntaxes:
each HASH
each ARRAY
As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.
use Data::Dumper;
sub f { print(Dumper(#_)); }
sub g(\%) { print(Dumper(#_)); } # Similar to each
my %h = (a=>1, b=>2);
f(%h); # Evaluates %h in list context.
print("\n");
g(%h); # Passes a reference to %h.
Output:
$VAR1 = 'a'; # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;
$VAR1 = { # 1 arg, a reference to the hash
'a' => 1,
'b' => 2
};
%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.
Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.
use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));
Output:
$VAR1 = {
'abc' => 'def',
'ghi' => 'jkl'
};
$VAR1 = {
'abc' => 'DEF',
'ghi' => 'JKL'
};
You can read more about this here.

How do I define an anonymous scalar ref in Perl?

How do I properly define an anonymous scalar ref in Perl?
my $scalar_ref = ?;
my $array_ref = [];
my $hash_ref = {};
If you want a reference to some mutable storage, there's no particularly neat direct syntax for it. About the best you can manage is
my $var;
my $sref = \$var;
Or neater
my $sref = \my $var;
Or if you don't want the variable itself to be in scope any more, you can use a do block:
my $sref = do { \my $tmp };
At this point you can pass $sref around by value, and any mutations to the scalar it references will be seen by others.
This technique of course works just as well for array or hash references, just that there's neater syntax for doing that with [] and {}:
my $aref = do { \my #tmp }; ## same as my $aref = [];
my $href = do { \my %tmp }; ## same as my $href = {};
Usually you just declare and don't initialize it.
my $foo; # will be undef.
You have to consider that empty hash refs and empty array refs point to a data structure that has a representation. Both of them, when dereferenced, give you an empty list.
perldata says (emphasis mine):
There are actually two varieties of null strings (sometimes referred to as "empty" strings), a defined one and an undefined one. The defined version is just a string of length zero, such as "" . The undefined version is the value that indicates that there is no real value for something, such as when there was an error, or at end of file, or when you refer to an uninitialized variable or element of an array or hash. Although in early versions of Perl, an undefined scalar could become defined when first used in a place expecting a defined value, this no longer happens except for rare cases of autovivification as explained in perlref. You can use the defined() operator to determine whether a scalar value is defined (this has no meaning on arrays or hashes), and the undef() operator to produce an undefined value.
So an empty scalar (which it didn't actually say) would be undef. If you want it to be a reference, make it one.
use strict;
use warnings;
use Data::Printer;
my $scalar_ref = \undef;
my $scalar = $$scalar_ref;
p $scalar_ref;
p $scalar;
This will output:
\ undef
undef
However, as ikegami pointed out, it will be read-only because it's not a variable. LeoNerd provides a better approach for this in his answer.
Anyway, my point is, an empty hash ref and an empty array ref when dereferenced both contain an empty list (). And that is not undef but nothing. But there is no nothing as a scalar value, because everything that is not nothing is a scalar value.
my $a = [];
say ref $r; # ARRAY
say scalar #$r; # 0
say "'#$r'"; # ''
So there is no real way to initialize with nothing. You can only not initialize. But Moose will turn it to undef anyway.
What you could do is make it maybe a scalar ref.
use strict;
use warnings;
use Data::Printer;
{
package Foo;
use Moose;
has bar => (
is => 'rw',
isa => 'Maybe[ScalarRef]',
predicate => 'has_bar'
);
}
my $foo = Foo->new;
p $foo->has_bar;
p $foo;
say $foo->bar;
Output:
""
Foo {
Parents Moose::Object
public methods (3) : bar, has_bar, meta
private methods (0)
internals: {}
}
Use of uninitialized value in say at scratch.pl line 268.
The predicate gives a value that is not true (the empty string ""). undef is also not true. The people who made Moose decided to go with that, but it really doesn't matter.
Probably what you want is not have a default value, but just make it a ScalarRef an required.
Note that perlref doesn't say anything about initializing an empty scalar ref either.
I'm not entirely sure why you need to but I'd suggest:
my $ref = \undef;
print ref $ref;
Or perhaps:
my $ref = \0;
#LeoNerd's answer is spot on.
Another option is to use a temporary anonymous hash value:
my $scalar_ref = \{_=>undef}->{_};
$$scalar_ref = "Hello!\n";
print $$scalar_ref;

Anonymous Hash Slices - syntax?

I love hash slices and use them frequently:
my %h;
#h{#keys}=#vals;
Works brilliantly! But 2 things have always vexed me.
First, is it possible to combine the 2 lines above into a single line of code? It would be nice to declare the hash and populate it all at once.
Second, is it possible to slice an existing anonymous hash... something like:
my $slice=$anonh->{#fields}
First question:
my %h = map { $keys[$_] => $vals[$_] } 0..$#keys;
or
use List::MoreUtils qw( mesh );
my %h = mesh #keys, #vals;
Second question:
If it's ...NAME... for a hash, it's ...{ $href }... for a hash ref, so
my #slice = #hash{#fields};
is
my #slice = #{ $anonh }{#fields};
The curlies are optional if the reference expression is a variable.
my #slice = #$anonh{#fields};
Mini-Tutorial: Dereferencing Syntax
References quick reference
perlref
perlreftut
perldsc
perllol
For your first question, to do it in a single line of code:
#$_{#keys}=#vals for \my %h;
or
map #$_{#keys}=#vals, \my %h;
but I wouldn't do that; it's a confusing way to write it.
Either version declares the variable and immediately takes a reference to it and aliases $_ to the reference so that the hash reference can be used in a slice. This lets you declare the variable in the existing scope; #{ \my %h }{#keys} = #vals; also "works", but has the unfortunate drawback of scoping %h to that tiny block in the hash slice.
For your second question, as shown above, slices can be used on hash references; see http://perlmonks.org/?node=References+quick+reference for some easy to remember rules.
my #slice = #$anonh{#fields};
or maybe you meant:
my $slice = [ #$anonh{#fields} ];
but #slice/$slice there is a copy of the values. To get an array of aliases to the hash values, you can do:
my $slice = sub { \#_ }->( #$anonh{#fields} );
Hash slice syntax is
# <hash-name-or-hash-ref> { LIST }
When you are slicing a hash reference, enclose it in curly braces so it doesn't get dereferenced as an array. This gives you:
my #values = #{$anonh}{#fields}
for a hash reference $anonh.

Subroutine that returns hash - breaks it into separate variables

I have a subroutine that returns a hash. Last lines of the subroutine:
print Dumper(\%fileDetails);
return %fileDetails;
in this case the dumper prints:
$VAR1 = {
'somthing' => 0,
'somthingelse' => 7.68016712043654,
'else' => 'burst'
}
But when I try to dump it calling the subroutine with this line:
print Dumper(\fileDetailsSub($files[$i]));
the dumper prints:
$VAR1 = \'somthing';
$VAR2 = \0;
$VAR3 = \'somthingelse';
$VAR4 = \7.68016712043654;
$VAR5 = \'else';
$VAR6 = \'burst';
Once the hash is broken, I can't use it anymore.
Why does it happen? And how can I preserve the proper structure on subroutine return?
Thanks,
Mark.
There's no such thing as returning a hash in Perl.
Subroutines take lists as their arguments and they can return lists as their result. Note that a list is a very different creature from an array.
When you write
return %fileDetails;
This is equivalent to:
return ( 'something', 0, 'somethingelse', 7.68016712043654, 'else', 'burst' );
When you invoke the subroutine and get that list back, one thing you can do is assign it to a new hash:
my %result = fileDetailsSub();
That works because a hash can be initialized with a list of key-value pairs. (Remember that (foo => 42, bar => 43 ) is the same thing as ('foo', 42, 'bar', 43).
Now, when you use the backslash reference operator on a hash, as in \%fileDetails, you get a hash reference which is a scalar the points to a hash.
Similarly, if you write \#array, you get an array reference.
But when you use the reference operator on a list, you don't get a reference to a list (since lists are not variables (they are ephemeral), they can't be referenced.) Instead, the reference operator distributes over list items, so
\( 'foo', 'bar', 'baz' );
makes a new list:
( \'foo', \'bar', \'baz' );
(In this case we get a list full of scalar references.) And this is what you're seeing when you try to Dumper the results of your subroutine: a reference operator distributed over the list of items returned from your sub.
So, one solution is to assign the result list to an actual hash variable before using Dumper. Another is to return a hash reference (what you're Dumpering anyway) from the sub:
return \%fileDetails;
...
my $details_ref = fileDetailsSub();
print Dumper( $details_ref );
# access it like this:
my $elem = $details_ref->{something};
my %copy = %{ $details_ref };
For more fun, see:
perldoc perlreftut - the Perl reference tutorial, and
perldoc perlref - the Perl reference reference.
Why not return a reference to the hash instead?
return \%fileDetails;
As long as it is a lexical variable, it will not complicate things with other uses of the subroutine. I.e.:
sub fileDetails {
my %fileDetails;
... # assign stuff
return \%fileDetails;
}
When the execution leaves the subroutine, the variable goes out of scope, but the data contained in memory remains.
The reason the Dumper output looks like that is that you are feeding it a referenced list. Subroutines cannot return arrays or hashes, they can only return lists of scalars. What you are doing is something like this:
print Dumper \(qw(something 0 somethingelse 7.123 else burst));
Perl functions can not return hashes, only lists. A return %foo statement will flatten out %foo into a list and returns the flattened list. To get the return value to be interpreted as a hash, you can assign it to a named hash
%new_hash = fileDetailsSub(...);
print Dumper(\%new_hash);
or cast it (not sure if that is the best word for it) with a %{{...}} sequence of operations:
print Dumper( \%{ {fileDetailsSub(...)} } );
Another approach, as TLP points out, is to return a hash reference from your function.
You can't return a hash directly, but perl can automatically convert between hashes and lists as needed. So perl is converting that into a list, and you are capturing it as a list. i.e.
Dumper( filedetail() ) # list
my %fd = filedetail(); Dumper( \%fd ); #hash
In list context, Perl does not distinguish between a hash and a list of key/value pairs. That is, if a subroutine returns a hash, what it really returns is an list of (key1, value1, key2, value2...). Fortunately, that works both ways; if you take such a list and assign it to a hash, you get a faithful copy of the original:
my %fileDetailsCopy = subroutineName();
But if it wouldn't break other code, it would probably make more sense to have the sub return a reference to the hash instead, as TLP said.

Confusion about proper usage of dereference in Perl

I noticed the other day that - while altering values in a hash - that when you dereference a hash in Perl, you actually are making a copy of that hash. To confirm I wrote this quick little script:
#! perl
use warnings;
use strict;
my %h = ();
my $hRef = \%h;
my %h2 = %{$hRef};
my $h2Ref = \%h2;
if($hRef eq $h2Ref) {
print "\n\tThey're the same $hRef $h2Ref";
}
else {
print "\n\tThey're NOT the same $hRef $h2Ref";
}
print "\n\n";
The output:
They're NOT the same HASH(0x10ff6848) HASH(0x10fede18)
This leads me to realize that there could be spots in some of my scripts where they aren't behaving as expected. Why is it even like this in the first place? If you're passing or returning a hash, it would be more natural to assume that dereferencing the hash would allow me to alter the values of the hash being dereferenced. Instead I'm just making copies all over the place without any real need/reason to beyond making syntax a little more obvious.
I realize the fact that I hadn't even noticed this until now shows its probably not that big of a deal (in terms of the need to go fix in all of my scripts - but important going forward). I think its going to be pretty rare to see noticeable performance differences out of this, but that doesn't alter the fact that I'm still confused.
Is this by design in perl? Is there some explicit reason I don't know about for this; or is this just known and you - as the programmer - expected to know and write scripts accordingly?
The problem is that you are making a copy of the hash to work with in this line:
my %h2 = %{$hRef};
And that is understandable, since many posts here on SO use that idiom to make a local name for a hash, without explaining that it is actually making a copy.
In Perl, a hash is a plural value, just like an array. This means that in list context (such as you get when assigning to a hash) the aggregate is taken apart into a list of its contents. This list of pairs is then assembled into a new hash as shown.
What you want to do is work with the reference directly.
for (keys %$hRef) {...}
for (values %$href) {...}
my $x = $href->{some_key};
# or
my $x = $$href{some_key};
$$href{new_key} = 'new_value';
When working with a normal hash, you have the sigil which is either a % when talking about the entire hash, a $ when talking about a single element, and # when talking about a slice. Each of these sigils is then followed by an identifier.
%hash # whole hash
$hash{key} # element
#hash{qw(a b)} # slice
To work with a reference named $href simply replace the string hash in the above code with $href. In other words, $href is the complete name of the identifier:
%$href # whole hash
$$href{key} # element
#$href{qw(a b)} # slice
Each of these could be written in a more verbose form as:
%{$href}
${$href}{key}
#{$href}{qw(a b)}
Which is again a substitution of the string '$href' for 'hash' as the name of the identifier.
%{hash}
${hash}{key}
#{hash}{qw(a b)}
You can also use a dereferencing arrow when working with an element:
$hash->{key} # exactly the same as $$hash{key}
But I prefer the doubled sigil syntax since it is similar to the whole aggregate and slice syntax, as well as the normal non-reference syntax.
So to sum up, any time you write something like this:
my #array = #$array_ref;
my %hash = %$hash_ref;
You will be making a copy of the first level of each aggregate. When using the dereferencing syntax directly, you will be working on the actual values, and not a copy.
If you want a REAL local name for a hash, but want to work on the same hash, you can use the local keyword to create an alias.
sub some_sub {
my $hash_ref = shift;
our %hash; # declare a lexical name for the global %{__PACKAGE__::hash}
local *hash = \%$hash_ref;
# install the hash ref into the glob
# the `\%` bit ensures we have a hash ref
# use %hash here, all changes will be made to $hash_ref
} # local unwinds here, restoring the global to its previous value if any
That is the pure Perl way of aliasing. If you want to use a my variable to hold the alias, you can use the module Data::Alias
You are confusing the actions of dereferencing, which does not inherently create a copy, and using a hash in list context and assigning that list, which does. $hashref->{'a'} is a dereference, but most certainly does affect the original hash. This is true for $#$arrayref or values(%$hashref) also.
Without the assignment, just the list context %$hashref is a mixed beast; the resulting list contains copies of the hash keys but aliases to the actual hash values. You can see this in action:
$ perl -wle'$x={"a".."f"}; for (%$x) { $_=chr(ord($_)+10) }; print %$x'
epcnal
vs.
$ perl -wle'$x={"a".."f"}; %y=%$x; for (%y) { $_=chr(ord($_)+10) }; print %$x; print %y'
efcdab
epcnal
but %$hashref isn't acting any differently than %hash here.
No, dereferencing does not create a copy of the referent. It's my that creates a new variable.
$ perl -E'
my %h1; my $h1 = \%h1;
my %h2; my $h2 = \%h2;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x83b62e0)
HASH(0x83b6340)
0
$ perl -E'
my %h;
my $h1 = \%h;
my $h2 = \%h;
say $h1;
say $h2;
say $h1 == $h2 ?1:0;
'
HASH(0x9eae2d8)
HASH(0x9eae2d8)
1
No, $#{$someArrayHashRef} does not create a new array.
If perl did what you suggest, then variables would get aliased very easily, which would be far more confusing. As it is, you can alias variables with globbing, but you need to do so explicitly.