How can I make a new, empty hash reference in Perl? - perl

Say I had something like:
# %superhash is some predefined hash with more than 0 keys;
%hash = ();
foreach my $key (keys %superhash){
$superhash{ $key } = %hash;
%hash = ();
}
Would all the keys of superhash point to the same empty hash accessed by %hash or would they be different empty hashes?
If not, how can I make sure they point to empty hashes?

You need to use the \ operator to take a reference to a plural data type (array or hash) before you can store it into a single slot of either. But in the example code given, if referenced, each would be the same hash.
The way to initialize your data structure is:
foreach my $key (keys %superhash) {
$superhash{ $key } = {}; # New empty hash reference
}
But initialization like this is largely unnecessary in Perl due to autovivification (creating appropriate container objects when a variable is used as a container).
my %hash;
$hash{a}{b} = 1;
Now %hash has one key, 'a', which has a value of an anonymous hashref, containing the key/value pair b => 1. Arrays autovivify in the same manner.

Related

How to parse through many hashes using foreach?

foreach my %hash (%myhash1,%myhash2,%myhash3)
{
while (($keys,$$value) = each %hash)
{
#use key and value...
}
}
Why doesn't this work :
it says synta error on foreach line.
Pls tell me why is it wrong.
This is wrong because you seem to think that this allows you to access each hash as a separate construct, whereas what you are in fact doing is, besides a syntax error, accessing the hashes as a mixed-together new list. For example:
my %hash1 = qw(foo 1 bar 1);
my %hash2 = qw(abc 1 def 1);
for (%hash1, %hash2) # this list is now qw(foo 1 bar 1 abc 1 def 1)
When you place a hash (or array) in a list context statement, they are expanded into their elements, and their integrity is not preserved. Some built-in functions do allow this behaviour, but normal Perl code does not.
You also cannot assign a hash as the for iterator variable, that can only ever be a scalar value. What you can do is this:
for my $hash (\%myhash1, \%myhash2, \%myhash3) {
while (my ($key, $value) = each %$hash) {
...
Which is to say, you create a list of hash references and iterate over them. Note that you cannot tell the difference between the hashes with this approach.
Note also that I use my $hash because this variable must be a scalar.
The syntax should be like:
my $hash1 = {'a'=>1};
my $hash2 = {'b'=>1};
my #arr2 = ($hash1, $hash2);
foreach $hash (#arr2)
{
while(($key, $value) = each %$hash)
{
print $key, $value;
}
}
you need to reference and then dereference the hash.

How to use a 'subroutine reference' as a hash key

In Perl, I'm learning how to dereference 'subroutine references'. But I can't seem to use a subroutine reference as a hash 'key'.
In the following sample code,
I can create a reference to a subroutine ($subref) and then dereference it to run the subroutine (&$subref)
I can use the reference as a hash 'value' and then easily dereference that
But I cannot figure out how to use the reference as a hash 'key'. When I pull the key out of the hash, Perl interprets the key as a string value (not a reference) - which I now understand (thanks to this site!). So I've tried Hash::MultiKey, but that seems to turn it into an array reference. I want to treat it as a subroutine/code reference, assuming this is somehow possible?
Any other ideas?
use strict;
#use diagnostics;
use Hash::MultiKey;
my $subref = \&hello;
#1:
&$subref('bob','sue'); #okay
#2:
my %hash;
$hash{'sayhi'}=$subref;
&{$hash{'sayhi'}}('bob','sue'); #okay
#3:
my %hash2;
tie %hash2, 'Hash::MultiKey';
$hash2{$subref}=1;
foreach my $key (keys %hash2) {
print "Ref type is: ". ref($key)."\n";
&{$key}('bob','sue'); # Not okay
}
sub hello {
my $name=shift;
my $name2=shift;
print "hello $name and $name2\n";
}
This is what is returned:
hello bob and sue
hello bob and sue
Ref type is: ARRAY
Not a CODE reference at d:\temp\test.pl line 21.
That is correct, a normal hash key is only a string. Things that are not strings get coerced to their string representation.
my $coderef = sub { my ($name, $name2) = #_; say "hello $name and $name2"; };
my %hash2 = ( $coderef => 1, );
print keys %hash2; # 'CODE(0x8d2280)'
Tieing is the usual means to modify that behaviour, but Hash::MultiKey does not help you, it has a different purpose: as the name says, you may have multiple keys, but again only simple strings:
use Hash::MultiKey qw();
tie my %hash2, 'Hash::MultiKey';
$hash2{ [$coderef] } = 1;
foreach my $key (keys %hash2) {
say 'Ref of the key is: ' . ref($key);
say 'Ref of the list elements produced by array-dereferencing the key are:';
say ref($_) for #{ $key }; # no output, i.e. simple strings
say 'List elements produced by array-dereferencing the key are:';
say $_ for #{ $key }; # 'CODE(0x8d27f0)'
}
Instead, use Tie::RefHash. (Code critique: prefer this syntax with the -> arrow for dereferencing a coderef.)
use 5.010;
use strict;
use warnings FATAL => 'all';
use Tie::RefHash qw();
my $coderef = sub {
my ($name, $name2) = #_;
say "hello $name and $name2";
};
$coderef->(qw(bob sue));
my %hash = (sayhi => $coderef);
$hash{sayhi}->(qw(bob sue));
tie my %hash2, 'Tie::RefHash';
%hash2 = ($coderef => 1);
foreach my $key (keys %hash2) {
say 'Ref of the key is: ' . ref($key); # 'CODE'
$key->(qw(bob sue));
}
From perlfaq4:
How can I use a reference as a hash key?
(contributed by brian d foy and Ben Morrow)
Hash keys are strings, so you can't really use a reference as the key.
When you try to do that, perl turns the reference into its stringified
form (for instance, HASH(0xDEADBEEF) ). From there you can't get back
the reference from the stringified form, at least without doing some
extra work on your own.
Remember that the entry in the hash will still be there even if the
referenced variable goes out of scope, and that it is entirely
possible for Perl to subsequently allocate a different variable at the
same address. This will mean a new variable might accidentally be
associated with the value for an old.
If you have Perl 5.10 or later, and you just want to store a value
against the reference for lookup later, you can use the core
Hash::Util::Fieldhash module. This will also handle renaming the keys
if you use multiple threads (which causes all variables to be
reallocated at new addresses, changing their stringification), and
garbage-collecting the entries when the referenced variable goes out
of scope.
If you actually need to be able to get a real reference back from each
hash entry, you can use the Tie::RefHash module, which does the
required work for you.
So it looks like Tie::RefHash will do what you want. But to be honest, I don't think that what you want to do is a particularly good idea.
Why do you need it? If you e.g. need to store parameters to the functions in the hash, you can use HoH:
my %hash;
$hash{$subref} = { sub => $subref,
arg => [qw/bob sue/],
};
foreach my $key (keys %hash) {
print "$key: ref type is: " . ref($key) . "\n";
$hash{$key}{sub}->( #{ $hash{$key}{arg} } );
}
But then, you can probably choose a better key anyway.

three questions on a Perl function

I am trying to use an existing Perl program, which includes the following function of GetItems. The way to call this function is listed in the following.
I have several questions for this program:
what does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
######################################
sub GetItems
######################################
{
my $classes = shift;
my %items = ();
foreach my $ref (#_)
{
foreach my $id (keys %$ref)
{
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
}
}
my #items = sort { $a <=> $b } keys %items;
open(VAL, "> $classes.items");
for my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
close VAL;
return \%items;
}
When you enter a function, #_ starts out as an array of (aliases to) all the parameters passed into the function; but the my $classes = shift removes the first element of #_ and stores it in the variable $classes, so the foreach my $ref (#_) iterates over all the remaining parameters, storing (aliases to) them one at a time in $ref.
Scalars, hashes, and arrays are all distinguished by the syntax, so they're allowed to have the same name. You can have a $foo, a #foo, and a %foo all at the same time, and they don't have to have any relationship to each other. (This, together with the fact that $foo[0] refers to #foo and $foo{'a'} refers to %foo, causes a lot of confusion for newcomers to the language; you're not alone.)
Exactly. It sets each element of %items to a distinct integer ranging from one to the number of elements, proceeding in numeric (!) order by key.
foreach my $ref (#_) loops through each hash reference passed as a parameter to GetItems. If the call looks like this:
$items = GetItems($classes, $pVectors, $nVectors, $uVectors);
then the loop processes the hash refs in $pVector, $nVectors, and $uVectors.
#items and %items are COMPLETELY DIFFERENT VARIABLES!! #items is an array variable and %items is a hash variable.
$items{$items[$i]} = $i + 1 does exactly as you say. It sets the value of the %items hash whose key is $items[$i] to $i+1.
Here is an (nearly) line by line description of what is happening in the subroutine
Define a sub named GetItems.
sub GetItems {
Store the first value in the default array #_, and remove it from the array.
my $classes = shift;
Create a new hash named %items.
my %items;
Loop over the remaining values given to the subroutine, setting $ref to the value on each iteration.
for my $ref (#_){
This code assumes that the previous line set $ref to a hash ref. It loops over the unsorted keys of the hash referenced by $ref, storing the key in $id.
for my $id (keys %$ref){
Using the key ($id) given by the previous line, loop over the keys of the hash ref at that position in $ref. While also setting the value of $cui.
for my $cui (keys %{$ref->{$id}}) {
Set the value of %item at position $cui, to 1.
$items{$cui} = 1;
End of the loops on the previous lines.
}
}
}
Store a sorted list of the keys of %items in #items according to numeric value.
my #items = sort { $a <=> $b } keys %items;
Open the file named by $classes with .items appended to it. This uses the old-style two arg form of open. It also ignores the return value of open, so it continues on to the next line even on error. It stores the file handle in the global *VAL{IO}.
open(VAL, "> $classes.items");
Loop over a list of indexes of #items.
for my $i (0 .. $#items){
Print the value at that index on it's own line to *VAL{IO}.
print VAL "$items[$i]\n";
Using that same value as an index into %items (which it is a key of) to the index plus one.
$items{$items[$i]} = $i + 1;
End of loop.
}
Close the file handle *VAL{IO}.
close VAL;
Return a reference to the hash %items.
return \%items;
End of subroutine.
}
I have several questions for this program:
What does foreach my $ref (#_) aim to do? I think #_ should be related to the parameters passed, but not quite sure.
Yes, you are correct. When you pass parameters into a subroutine, they automatically are placed in the #_ array. (Called a list in Perl). The foreach my $ref (#_) begins a loop. This loop will be repeated for each item in the #_ array, and each time, the value of $ref will be assigned the next item in the array. See Perldoc's Perlsyn (Perl Syntax) section about for loops and foreach loops. Also look at Perldoc's Perlvar (Perl Variables) section of General variables for information about special variables like #_.
Now, the line my $classes = shift; is removing the first item in the #_ list and putting it into the variable $classes. Thus, the foreach loop will be repeated three times. Each time, $ref will be first set to the value of $pVectors, $nVectors, and finally $uVectors.
By the way, these aren't really scalar values. In Perl, you can have what is called a reference. This is the memory location of the data structure you're referencing. For example, I have five students, and each student has a series of tests they've taken. I want to store all the values of each test in a hash keyed by the student's ID.
Normally, each entry in the hash can only contain a single item. However, what if this item refers to a list that contains the student's grades?
Here's the list of student #100's grade:
#grades = (100, 93, 89, 95, 74);
And here's how I set Student 100's entry in my hash:
$student{100} = \#grades;
Now, I can talk about the first grade of the year for Student #100 as $student{100}[0]. See the Perldoc's Mark's very short tutorial about references.
In my #items = sort { $a <=> $b } keys %items; the "items" on the left side should be different from the "items" on the right side? Why do they use the same name?
In Perl, you have three major types of variables: Lists (what some people call Arrays), Hashes (what some people call Keyed Arrays), and Scalars. In Perl, it is perfectly legal to have different variable types have the same name. Thus, you can have $var, %var, and #var in your program, and they'll be treated as completely separate variables1.
This is usually a bad thing to do and is highly discouraged. It gets worse when you think of the individual values: $var refers to the scalar while $var[3] refers to the list, and $var{3} refers to the hash. Yes, it can be very, very confusing.
In this particular case, he has a hash (a keyed array) called %item, and he's converting the keys in this hash into a list sorted by the keys. This syntax could be simplified from:
my #items = sort { $a <=> $b } keys %items;
to just:
my #items = sort keys %items;
See the Perldocs on the sort function and the keys function.
What does $items{$items[$i]} = $i + 1; aim to do? Looks like it just sets up the value for the hash $items sequentially.
Let's look at the entire loop:
foreach my $i (0 .. $#items)
{
print VAL "$items[$i]\n";
$items{$items[$i]} = $i + 1;
}
The subroutine is going to loop through this loop once for each item in the #items list. This is the sorted list of keys to the old %items hash. The $#items means the largest index in the item list. For example, if #items = ("foo", "bar", and "foobar"), then $#item would be 2 because the last item in this list is $item[2] which equals foobar.
This way, he's hitting the index of each entry in #items. (REMEMBER: This is different from %item!).
The next line is a bit tricky:
$items{$items[$i]} = $i + 1;
Remember that $item{} refers to the old %items hash! He's creating a new %items hash. This is being keyed by each item in the #items list. And, the value is being set to the index of that item plus 1. Let's assume that:
#items = ("foo", "bar", "foobar")
In the end, he's doing this:
$item{foo} = 1;
$item{bar} = 2;
$item{foobar} = 3;
1 Well, this isn't 100% true. Perl stores each variable in a kind of hash structure. In memory, $var, #var, and %var will be stored in the same hash entry in memory, but in positions related to each variable type. 99.9999% of the time, this matters not one bit. As far as you are concerned, these are three completely different variables.
However, there are a few rare occasions where a programmer will take advantage of this when they futz directly with memory in Perl.
I want to show you how I would write that subroutine.
Bur first, I want to show you some of the steps of how, and why, I changed the code.
Reduce the number of for loops:
First off this loop doesn't need to set the value of $items{$cui} to anything in particular. It also doesn't have to be a loop at all.
foreach my $cui (keys %{$ref->{$id}}) { $items{$cui} = 1 }
This does practically the same thing. The only real difference is it sets them all to undef instead.
#items{ keys %{$ref->{$id}} } = ();
If you really needed to set the values to 1. Note that (1)x#keys returns a list of 1's with the same number of elements in #keys.
my #keys = keys %{$ref->{$id}};
#items{ #keys } = (1) x #keys;
If you are going to have to loop over a very large number of elements then a for loop may be a good idea, but only if you have to set the value to something other than undef. Since we are only using the loop variable once, to do something simple; I would use this code:
$items{$_} = 1 for keys %{$ref->{$id}};
Swap keys with values:
On the line before that we see:
foreach my $id (keys %$ref){
In case you didn't notice $id was used only once, and that was for getting the associated value.
That means we can use values and get rid of the %{$ref->{$id}} syntax.
for my $hash (values %$ref){
#items{ keys %$hash } = ();
}
( $hash isn't a good name, but I don't know what it represents. )
3 arg open:
It isn't recommended to use the two argument form of open, or to blindly use the bareword style of filehandles.
open(VAL, "> $classes.items");
As an aside, did you know there is also a one argument form of open. I don't really recommend it though, it's mostly there for backward compatibility.
our $VAL = "> $classes.items";
open(VAL);
The recommend way to do it, is with 3 arguments.
open my $val, '>', "$classes.items";
There may be some rare edge cases where you need/want to use the two argument version though.
Put it all together:
sub GetItems {
# this will cause open and close to die on error (in this subroutine only)
use autodie;
my $classes = shift;
my %items;
for my $vector_hash (#_){
# use values so that we don't have to use $ref->{$id}
for my $hash (values %$ref){
# create the keys in %items
#items{keys %$hash} = ();
}
}
# This assumes that the keys of %items are numbers
my #items = sort { $a <=> $b } keys %items;
# using 3 arg open
open my $output, '>', "$classes.items";
my $index; # = 0;
for $item (#items){
print {$output} $item, "\n";
$items{$item} = ++$index; # 1...
}
close $output;
return \%items;
}
Another option for that last for loop.
for my $index ( 1..#items ){
my $item = $items[$index-1];
print {$output} $item, "\n";
$items{$item} = $index;
}
If your version of Perl is 5.12 or newer, you could write that last for loop like this:
while( my($index,$item) = each #items ){
print {$output} $item, "\n";
$items{$item} = $index + 1;
}

Reversing a hash, getting its keys, and sorting

I'm still learning Perl, so there's probably a more efficient way of doing this. I'm trying to take a hash, reverse it so $values => $keys, grab the new key (the old value) and then sort those keys.
Here's the code in question:
foreach my $key (sort keys reverse %hash){
...}
What I'm expecting to happen is that reverse %hash will return a hash type, which is what keys is looking for. However, I get the following error:
Type of arg 1 to keys must be hash (not reverse)
I've tried putting parentheses around reverse %hash, but still get the same thing.
Any ideas why this wouldn't work?
Perl functions can either return scalar values or lists; there is no explicit hash return type
(You can call return %hash from a subroutine, but Perl implicitly unrolls the key-value pairs from the hash and returns them as a list).
Therefore, the return value of reverse %hash is a list, not a hash, and not suitable for use as an argument to keys. You can force Perl to interpret this list as a hash with the %{{}} cast:
foreach my $key (sort keys %{{reverse %hash}}) { ...
You could also sort on the values of the hash by saying
foreach my $key (sort values %hash) { ...
Using values %hash is subtly different from using keys %{{reverse %hash}} in that keys %{{reverse %hash}} will not return any duplicate values.
The argument to keys must be a hash, array or expression, not a list. If you did
keys { reverse %hash }
you would get the result you expect, because the brackets create an anonymous hash. Parens, on the other hand, only change the precedence. Or, in this case they are probably considered related to the function keys(), as most perl functions have optional parens.
Also, if you just want the values of the hash, you can use:
values %hash
See the documentation for reverse, values and keys for more info.
I think you're describing exactly the situation in this example:
#!/usr/local/bin/perl
use strict;
use warnings;
my %hash = (one => 1, two => 2, three => 3, four => 4);
%hash = reverse %hash;
foreach my $key (sort {$a <=> $b} keys %hash) {
print "$key=>$hash{$key}, ";
}
print "\n";
# it displays: 1=>one, 2=>two, 3=>three, 4=>four
Pre 5.14, keys returns the keys of hash. That requires a hash. You didn't provide one. reverse doesn't return a hash. In fact, it's impossible to return a hash as only scalars can be returned. (Internally, Perl can put hashes directly on the stack, but that will never be visible to the user without causing a "Bizarre" error message.) This error is detected at compile-time.
5.14 is more flexible. It will also accept a reference to a hash. (It will also accept arrays and references to arrays, but that's not relevant here.) References are scalars, so they can be returned by functions. Your code will actually make it to run-time, but whatever your reverse returns in scalar context won't a reference to a hash that doesn't exist, so your code will die at that point.
Do you have a reason to want to reverse the hash?
foreach my $key (sort { $hash{$a} cmp $hash{$b} } keys %hash) {
my $val = $hash{$key};
...
}
If you do,
foreach my $val (sort keys %{ { reverse %hash } }) {
# No access to original key
...
}
or
my %flipped = reverse %hash;
foreach my $val (sort keys %flipped) {
my $key = $flipped{$val};
...
}

How can I pass a hash to a Perl subroutine?

In one of my main( or primary) routines,I have two or more hashes. I want the subroutine foo() to recieve these possibly-multiple hashes as distinct hashes. Right now I have no preference if they go by value, or as references. I am struggling with this for the last many hours and would appreciate help, so that I dont have to leave perl for php! ( I am using mod_perl, or will be)
Right now I have got some answer to my requirement, shown here
From http://forums.gentoo.org/viewtopic-t-803720-start-0.html
# sub: dump the hash values with the keys '1' and '3'
sub dumpvals
{
foreach $h (#_)
{
print "1: $h->{1} 3: $h->{3}\n";
}
}
# initialize an array of anonymous hash references
#arr = ({1,2,3,4}, {1,7,3,8});
# create a new hash and add the reference to the array
$t{1} = 5;
$t{3} = 6;
push #arr, \%t;
# call the sub
dumpvals(#arr);
I only want to extend it so that in dumpvals I could do something like this:
foreach my %k ( keys #_[0]) {
# use $k and #_[0], and others
}
The syntax is wrong, but I suppose you can tell that I am trying to get the keys of the first hash ( hash1 or h1), and iterate over them.
How to do it in the latter code snippet above?
I believe this is what you're looking for:
sub dumpvals {
foreach my $key (keys %{$_[0]}) {
print "$key: $_[0]{$key}\n";
}
}
An element of the argument array is a scalar, so you access it as $_[0] not #_[0].
keys operates on hashes, not hash refs, so you need to dereference, using %
And of course, the keys are scalars, not hashes, so you use my $key, not my %key.
To have dumpvals dump the contents of all hashes passed to it, use
sub dumpvals {
foreach my $h (#_) {
foreach my $k (keys %$h) {
print "$k: $h->{$k}\n";
}
}
}
Its output when called as in your question is
1: 2
3: 4
1: 7
3: 8
1: 5
3: 6