PERL -- hash in argument / return / reuse - perl

I wish to fill a hash table successively by applying a function several times. The function takes a hash reference in argument, fills it, and returns it. The hash is taken again in argument by the function.
It seems that the hash is not filled at all.
Here is my code :
Can someone tell me where might be the error please ?
sub extractMLResult {
my (%h1, %h2, %h3, %h4, $param) = #_;
my $h1= shift;
my $h2= shift;
my $h3= shift;
my $h4=shift;
$params= shift;
# read csv file, split it, fill hashes with values
$h1->{$key1}{$key2}{'p'}=$val1;
# ... do the same for the other hashes ...
return (%$h1, %$h2, %$h3, %$h4);
}
my %myhash = ();
my %h1= ();
my %h2= ();
my %h3= ();
my %h4= ();
$myhash{'a'}{'x'}=1;
$myhash{'b'}{'y'}=1;
if (exists $myhash{'a'}){
(%h1, %h2, %h3, %h4) = extractMLResult(\%h1, \%h2, \%h3, \%h4, 'a');
}
if (exists $myhash{'b'}){
(%h1, %h2, %h3, %h4) = extractMLResult(\%h1, \%h2, \%h3, \%h4, 'b');
}

my declares variables in the lexical scope. So the instant you exit your 'if' clause, %$h1 etc. vanishes again.
Also, you're doing some strange things with the assigning, which I don't think will be working thew way you think - you're deferencing your hash-references, and as such returning a list of values.
Those will all be going into %$h1 because of the way list assignments work.
But on the flip side - when you're reading in myfunction your assignment probably isn't doing what you think.
Because you're calling myfunction and passing a list of values, but you're doing a list assignment for %h1. That means all your arguments are 'consumed':
#!/usr/bin/env perl
use strict;
use warnings;
use Data::Dumper;
sub myfunction {
my (%h1, %h2, %h3, %h4, $param) = #_;
print Dumper \%h1;
print Dumper \%h2;
}
# launch function :
my %h1 = ( a => "test" );
my %h2 = ( b => "wibble" );
myfunction ( \%h1, \%h2 );
As you will see - your arguments are all consumed by the assignment to %h1 and none are left for the rest of your assignments.
More importantly - your code doesn't even compile, because if you do this:
my (%$h1, %$h2, %$h3, %$h4) = myfunction (\%h1, \%h2, \%h3, \%h4, "a");
You get:
Can't declare hash dereference in "my"
So perhaps give us some sample code that actually illustrates the problem - and runs, with some sample data?
Edit: With the more code - the problem is right here:
sub extractMLResult {
my (%h1, %h2, %h3, %h4, $param) = #_;
Because that's not doing what you think it's doing. Because %h1 is a hash, and it's assigned in a list context - all the arguments of #_ are inserted into it. So %h2, %h3, %h4, $param will always be empty/undefined.
You don't indicate whether you're actually using %h1 though, which just means it's confusing nonsense - potentially.
But this bit:
my $h1= shift;
my $h2= shift;
my $h3= shift;
my $h4_parents = shift;
Ok, so you're extracting some hash references here, which is perhaps a little more sane. But naming the same as the hashes is confusing - there's NO relationship between $h1 and %h1, and you'll confuse things in your code if you do that. (Because $h1{key} is from %h1 and nothing to do with $h1 in turn).
But the same problem exists in return:
(%h1, %h2, %h3, %h4) = extractMLResult(\%h1, \%h2, \%h3, \%h4, 'a');
Because your return:
return (%$h1, %$h2, %$h3, %$h4);
This return will give you back an unrolled list containing all the elements in the hashes. But given the way you're packing the hashes, they'll probably be a partially unrolled list, containing hash-references.
But then, they'll all be consumed by %h1 again, in the assignment.
You would need to:
return ( $h1, $h2, $h3, $h4);
And then in your function:
( $h1_ret, $h2_ret, $h3_ret, $h4_ret ) = extractMLResult(\%h1, \%h2, \%h3, \%h4, 'a');
And then unpack:
%h1 = %$h1_ret;
Or just stick with working with references all the way through, which is probably clearer for all concerned.

You are passing hash references into your subroutine. This is a good idea. But inside your subroutine, you are treating your parameters as hashes (not hash references). This is a bad idea.
A hash is initialised from a list. It should be a list with an even number of elements. Each pair of elements in the list will become a key/value pair in the hash.
my %french => ('one', 'un', 'two', 'deux', 'three', 'trois');
We often use the "fat comma" to emphasise the link between keys and values.
my %french => (one => 'un', two => 'deux', three => 'trois');
This means that hash initialisation is a greedy operation. It will use up all of any list that it is given. You cannot initialise two hashes in the same statement:
my (%french, %german) = (one => 'un', two => 'deux',
three => 'drei', four => 'vier');
This doesn't work, as all of the pairs will end up in %french, leaving nothing to populate %german.
This is the same mistake that you are making when extracting the parameters within your subroutine. You have this:
my (%h1, %h2, %h3, %h4, $param) = #_;
Nothing will end up in %h2,%h3,%h4or$paramas the assignment to%his greedy and will take all of the data values from#_` - leaving nothing for the other variables.
But, as you are passing hash references, your code shouldn't look like that. A hash reference is a scalar value (that's pretty much the point of them) so it is stored in a scalar variable.
What you want is this:
# Note, scalars ($), not hashes (%)
my ($h1, $h2, $h3, $h4, $param) = #_;
This should get you started. Note also, that you'll now need to deal with hash references ($h1->{key}) rather than hashes ($h1{key}).
And, please, always include both use strict and use warnings.

Related

What does this code do in Perl: keys(%$hash) ...?

print "Enter the hash \n";
$hash=<STDIN>;chop($hash);
#keys = keys (%$hash);
#values = values (%$hash);
Since Google ignores special characters there was no way I could find what the "%$hash" thing does and how this is suppossed to work
keys(%$hash) returns the keys of the hash referenced by the value in $hash. A hash is a type of associative array, which (more or less) means an array that's indexed by strings (called "keys") instead of by numbers.
In this particular case, $hash contains a string. When one uses a string as a reference, dereferencing it access the package variable whose name matches the string.
If the full program is
%FOO = ( a=>1, b=>2 );
%BAR = ( c=>3, d=>4 );
print "Enter the hash \n";
$hash=<STDIN>;chop($hash);
#keys = keys(%$hash);
Then,
#keys will contains a and b if the user enters FOO.
#keys will contains c and d if the user enters BAR.
#keys will contains E2BIG, EACCES, EADDRINUSE and many more if the user enters !.
#keys can contains paths if the user enters INC.
#keys will be empty for most other values.
(The keys are returned in an arbitrary order.)
The last three cases are surely unintentional. This is why the posted code is awful code. This is what the code should have been:
use strict; # Always use these as they
use warnings 'all'; # find/prevent numerous errors.
my %FOO = ( a=>1, b=>2 );
my %BAR = ( c=>3, d=>4 );
my %inputs = ( FOO => \%FOO, BAR => \%BAR );
print "Enter the name of a hash: ";
my $hash_name = <STDIN>;
chomp($hash_name);
my $hash = $inputs{$hash_name}
or die("No such hash\n");
my #keys = keys(%$hash);
...
keys() returns the keys of the specified hash. In the code you wrote, the name of the hash to look at (and extract the keys and values of) is being specified via STDIN, which is really bizarre behavior.
The code you posted is nonsensical, but what it should be doing is dereferencing a hash reference, provided that you have a valid hash reference stored in your scalar $hash (which you don't).
For example:
use strict;
use warnings;
use Data::Dump;
my $href = {
foo => 'bar',
bat => 'baz',
};
dd(keys(%$href)); # ("bat", "foo")
dd(values(%$href)); # ("baz", "bar")
The keys() function will return a list consisting of all the keys of the hash.
The returned values are copies of the original keys in the hash, so
modifying them will not affect the original hash.
The values() function does the exact same thing, except with the values of the hash (obviously).
So long as a given hash is unmodified you may rely on keys, values and
each to repeatedly return the same order as each other.
For more help with references, see perlreftut, perlref, and maybe perldsc if you're feeling adventurous.

Perl: safely make hash from list, checking for duplicates

In Perl if you have a list with an even number of elements you can straightforwardly convert it to a hash:
my #a = qw(each peach pear plum);
my %h = #a;
However, if there are duplicate keys then they will be silently accepted, with the last occurrence being the one used. I would like to make a hash checking that there are no duplicates:
my #a = qw(a x a y);
my %h = safe_hash_from_list(#a); # prints error: duplicate key 'a'
Clearly I could write that routine myself:
sub safe_hash_from_list {
die 'even sized list needed' if #_ % 2;
my %r;
while (#_) {
my $k = shift;
my $v = shift;
die "duplicate key '$k'" if exists $r{$k};
$r{$k} = $v;
}
return %r;
}
This, however, is quite a bit slower than the simple assignment. Moreover I do not want to use my own private routine if there is a CPAN module that already does the same job.
Is there a suitable routine on CPAN for safely turning lists into hashes? Ideally one that is a bit faster than the pure-Perl implementation above (though probably never quite as fast as the simple assignment).
If I may be allowed a related follow-up question, I'm also wondering about a tied hash class which allows each key to be assigned only once and dies on reassignment. That would be a more general case of the above problem. Again, I can write such a tied hash myself but I do not want to reinvent the wheel and I would prefer an optimized implementation if one already exists.
Quick way to check that no keys were duplicate would be count the keys and make sure they are equal to half the number of items in the list:
my #a = ...;
my %h = #a;
if (keys %h == (#a / 2)) {
print "Success!";
}

Subroutine that returns hash - breaks it into separate variables

I have a subroutine that returns a hash. Last lines of the subroutine:
print Dumper(\%fileDetails);
return %fileDetails;
in this case the dumper prints:
$VAR1 = {
'somthing' => 0,
'somthingelse' => 7.68016712043654,
'else' => 'burst'
}
But when I try to dump it calling the subroutine with this line:
print Dumper(\fileDetailsSub($files[$i]));
the dumper prints:
$VAR1 = \'somthing';
$VAR2 = \0;
$VAR3 = \'somthingelse';
$VAR4 = \7.68016712043654;
$VAR5 = \'else';
$VAR6 = \'burst';
Once the hash is broken, I can't use it anymore.
Why does it happen? And how can I preserve the proper structure on subroutine return?
Thanks,
Mark.
There's no such thing as returning a hash in Perl.
Subroutines take lists as their arguments and they can return lists as their result. Note that a list is a very different creature from an array.
When you write
return %fileDetails;
This is equivalent to:
return ( 'something', 0, 'somethingelse', 7.68016712043654, 'else', 'burst' );
When you invoke the subroutine and get that list back, one thing you can do is assign it to a new hash:
my %result = fileDetailsSub();
That works because a hash can be initialized with a list of key-value pairs. (Remember that (foo => 42, bar => 43 ) is the same thing as ('foo', 42, 'bar', 43).
Now, when you use the backslash reference operator on a hash, as in \%fileDetails, you get a hash reference which is a scalar the points to a hash.
Similarly, if you write \#array, you get an array reference.
But when you use the reference operator on a list, you don't get a reference to a list (since lists are not variables (they are ephemeral), they can't be referenced.) Instead, the reference operator distributes over list items, so
\( 'foo', 'bar', 'baz' );
makes a new list:
( \'foo', \'bar', \'baz' );
(In this case we get a list full of scalar references.) And this is what you're seeing when you try to Dumper the results of your subroutine: a reference operator distributed over the list of items returned from your sub.
So, one solution is to assign the result list to an actual hash variable before using Dumper. Another is to return a hash reference (what you're Dumpering anyway) from the sub:
return \%fileDetails;
...
my $details_ref = fileDetailsSub();
print Dumper( $details_ref );
# access it like this:
my $elem = $details_ref->{something};
my %copy = %{ $details_ref };
For more fun, see:
perldoc perlreftut - the Perl reference tutorial, and
perldoc perlref - the Perl reference reference.
Why not return a reference to the hash instead?
return \%fileDetails;
As long as it is a lexical variable, it will not complicate things with other uses of the subroutine. I.e.:
sub fileDetails {
my %fileDetails;
... # assign stuff
return \%fileDetails;
}
When the execution leaves the subroutine, the variable goes out of scope, but the data contained in memory remains.
The reason the Dumper output looks like that is that you are feeding it a referenced list. Subroutines cannot return arrays or hashes, they can only return lists of scalars. What you are doing is something like this:
print Dumper \(qw(something 0 somethingelse 7.123 else burst));
Perl functions can not return hashes, only lists. A return %foo statement will flatten out %foo into a list and returns the flattened list. To get the return value to be interpreted as a hash, you can assign it to a named hash
%new_hash = fileDetailsSub(...);
print Dumper(\%new_hash);
or cast it (not sure if that is the best word for it) with a %{{...}} sequence of operations:
print Dumper( \%{ {fileDetailsSub(...)} } );
Another approach, as TLP points out, is to return a hash reference from your function.
You can't return a hash directly, but perl can automatically convert between hashes and lists as needed. So perl is converting that into a list, and you are capturing it as a list. i.e.
Dumper( filedetail() ) # list
my %fd = filedetail(); Dumper( \%fd ); #hash
In list context, Perl does not distinguish between a hash and a list of key/value pairs. That is, if a subroutine returns a hash, what it really returns is an list of (key1, value1, key2, value2...). Fortunately, that works both ways; if you take such a list and assign it to a hash, you get a faithful copy of the original:
my %fileDetailsCopy = subroutineName();
But if it wouldn't break other code, it would probably make more sense to have the sub return a reference to the hash instead, as TLP said.

assign a hash into a hash

I wish to assign a hash (returned by a method) into another hash, for a given key.
For e.g., a method returns a hash of this form:
hash1->{'a'} = 'a1';
hash1->{'b'} = 'b1';
Now, I wish to assign these hash values into another hash inside the calling method, to get something like:
hash2->{'1'}->{'a'} = 'a1';
hash2->{'1'}->{'b'} = 'b1';
Being new to perl, I'm not sure the best way to do this. But sounds trivial...
Your sub might be:
#!/usr/bin/env perl
use strict;
use warnings;
sub mystery
{
my($hashref) = { a => 'a1', b => 'b1' };
return $hashref;
}
my $hashref1 = mystery;
print "$hashref1->{a} and $hashref1->{b}\n";
my $hashref2 = { 1 => $hashref1 };
print "$hashref2->{1}->{a} and $hashref2->{1}->{b}\n";
One key point is that your notation for accessing the variables with the -> arrow operator is dealing with hash refs, not with plain hashes.
We have a 1st and a 2nd hash:
my %hash1 = (
a => 'a1',
b => 'b1');
my %hash2 = (1 => undef);
We can only assign scalar values to hashes, but this includes references. To take a reference, use the backslash operator:
$hash2{1} = \%hash1;
We can now dereference the values almost as in your example:
print $hash2{1}->{a}; # prints "a1"
Be carefull to use the correct sigil ($#%) as appropriate. Use the sigil of the data type you expect, wich is not neccessarily the type you declared.
"perldoc perlreftut" might be interesting.

Is %$var dereferencing a Perl hash?

I'm sending a subroutine a hash, and fetching it with my($arg_ref) = #_;
But what exactly is %$arg_ref? Is %$ dereferencing the hash?
$arg_ref is a scalar since it uses the $ sigil. Presumably, it holds a hash reference. So yes, %$arg_ref deferences that hash reference. Another way to write it is %{$arg_ref}. This makes the intent of the code a bit more clear, though more verbose.
To quote from perldata(1):
Scalar values are always named with '$', even when referring
to a scalar that is part of an array or a hash. The '$'
symbol works semantically like the English word "the" in
that it indicates a single value is expected.
$days # the simple scalar value "days"
$days[28] # the 29th element of array #days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array #days
So your example would be:
%$arg_ref # hash dereferenced from the value "arg_ref"
my($arg_ref) = #_; grabs the first item in the function's argument stack and places it in a local variable called $arg_ref. The caller is responsible for passing a hash reference. A more canonical way to write that is:
my $arg_ref = shift;
To create a hash reference you could start with a hash:
some_sub(\%hash);
Or you can create it with an anonymous hash reference:
some_sub({pi => 3.14, C => 4}); # Pi is a gross approximation.
Instead of dereferencing the entire hash like that, you can grab individual items with
$arg_ref->{key}
A good brief introduction to references (creating them and using them) in Perl is perldoc perfeftut. You can also read it online (or get it as a pdf). (It talks more about references in complex data structures than in terms of passing in and out of subroutines, but the syntax is the same.)
my %hash = ( fred => 'wilma',
barney => 'betty');
my $hashref = \%hash;
my $freds_wife = $hashref->{fred};
my %hash_copy = %$hash # or %{$hash} as noted above.
Soo, what's the point of the syntax flexibility? Let's try this:
my %flintstones = ( fred => { wife => 'wilma',
kids => ['pebbles'],
pets => ['dino'],
}
barney => { # etc ... }
);
Actually for deep data structures like this it's often more convenient to start with a ref:
my $flintstones = { fred => { wife => 'Wilma',
kids => ['Pebbles'],
pets => ['Dino'],
},
};
OK, so fred gets a new pet, 'Velociraptor'
push #{$flintstones->{fred}->{pets}}, 'Velociraptor';
How many pets does Fred have?
scalar # {flintstones->{fred}->{pets} }
Let's feed them ...
for my $pet ( # {flintstones->{fred}->{pets} } ) {
feed($pet)
}
and so on. The curly-bracket soup can look a bit daunting at first, but it becomes quite easy to deal with them in the end, so long as you're consistent in the way that you deal with them.
Since it's somewhat clear this construct is being used to provide a hash reference as a list of named arguments to a sub it should also be noted that this
sub foo {
my ($arg_ref) = #_;
# do something with $arg_ref->{keys}
}
may be overkill as opposed to just unpacking #_
sub bar {
my ($a, $b, $c) = #_;
return $c / ( $a * $b );
}
Depending on how complex the argument list is.