How to manipulate a hash-ref with Perl? - perl

Take a look at this code. After hours of trial and error. I finally got a solution. But have no idea why it works, and to be quite honest, Perl is throwing me for a loop here.
use Data::Diff 'Diff';
use Data::Dumper;
my $out = Diff(\#comparr,\#grabarr);
my #uniq_a;
#temp = ();
my $x = #$out{uniq_a};
foreach my $y (#$x) {
#temp = ();
foreach my $z (#$y) {
push(#temp, $z);
}
push(#uniq_a, [$temp[0], $temp[1], $temp[2], $temp[3]]);
}
Why is it that the only way I can access the elements of the $out array is to pass a hash key into a scalar which has been cast as an array using a for loop? my $x = #$out{uniq_a}; I'm totally confused. I'd really appreciate anyone who can explain what's going on here so I'll know for the future. Thanks in advance.

$out is a hash reference, and you use the dereferencing operator ->{...} to access members of the hash that it refers to, like
$out->{uniq_a}
What you have stumbled on is Perl's hash slice notation, where you use the # sigil in front of the name of a hash to conveniently extract a list of values from that hash. For example:
%foo = ( a => 123, b => 456, c => 789 );
$foo = { a => 123, b => 456, c => 789 };
print #foo{"b","c"}; # 456,789
print #$foo{"c","a"}; # 789,123
Using hash slice notation with a single element inside the braces, as you do, is not the typical usage and gives you the results you want by accident.

The Diff function returns a hash reference. You are accessing the element of this hash that has key uniq_a by extracting a one-element slice of the hash, instead of the correct $out->{uniq_a}. Your code should look like this
my $out = Diff(\#comparr, \#grabarr);
my #uniq_a;
my $uniq_a = $out->{uniq_a};
for my $list (#$uniq_a) {
my #temp = #$list;
push #uniq_a, [ #temp[0..3] ];
}

In the documentation for Data::Diff it states:
The value returned is always a hash reference and the hash will have
one or more of the following hash keys: type, same, diff, diff_a,
diff_b, uniq_a and uniq_b
So $out is a reference and you have to access the values through the mentioned keys.

Related

Understanding the 'foreach' syntax for the keys of a hash

I have created a simple Perl hash
Sample.pl
$skosName = 'foo';
$skosId = 'abc123';
$skosFile{'type'}{$skosId} = $skosName;
Later on I try to print the hash values using foreach.
This variant works
foreach $skosfile1type ( keys %{skosFile} ){
print ...
}
While this one doesn't
foreach $skosfile1type ( keys %{$skosFile} ) {
print ...
}
What is the difference between the two foreach statements?
In particular, what is the significance of the dollar sign $ in the statement that doesn't work?
Is it something to do with scope, or perhaps my omission of the my or our keywords?
%{skosfile} is the same as %skosfile. It refers to a hash variable with that name. Usually that form isn't used for a simple variable name, but it's allowable.
%{$skosfile} means to look at the scalar variable $skosfile (remember, in perl, $foo, %foo, and #foo are distinctvariables), and, expecting $skosfile to be a hashref, it returns the hash that the reference points to. It is equivalent to %$skosfile, but in fact any expression that returns a hashref can appear inside of %{...}.
The syntax %{ $scalar } is used to tell Perl that the type of $scalar is a hash ref and you want to undo the reference. That is why you need the dollar sign $: $skosfile is the variable you are trying to dereference.
In the same fashion, #{ $scalar } serves to dereference an array.
Although it does not work for complex constructions, in simple cases you may also abbreviate %{$scalar} to %$scalar and #{$scalar} to #$scalar.
In the case of the expression keys %{$skosfile}, keys needs a hash which you obtain by dereferencing $skosfile, a hash ref. In fact, the typical foreach loop for a hash looks like:
foreach my $key ( keys %hash ) {
# do something with $key
}
When you iterate a hash ref:
foreach my $key ( keys %{ $hashref } ) {
# do something with $key
}

Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate]

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

Subroutine that returns hash - breaks it into separate variables

I have a subroutine that returns a hash. Last lines of the subroutine:
print Dumper(\%fileDetails);
return %fileDetails;
in this case the dumper prints:
$VAR1 = {
'somthing' => 0,
'somthingelse' => 7.68016712043654,
'else' => 'burst'
}
But when I try to dump it calling the subroutine with this line:
print Dumper(\fileDetailsSub($files[$i]));
the dumper prints:
$VAR1 = \'somthing';
$VAR2 = \0;
$VAR3 = \'somthingelse';
$VAR4 = \7.68016712043654;
$VAR5 = \'else';
$VAR6 = \'burst';
Once the hash is broken, I can't use it anymore.
Why does it happen? And how can I preserve the proper structure on subroutine return?
Thanks,
Mark.
There's no such thing as returning a hash in Perl.
Subroutines take lists as their arguments and they can return lists as their result. Note that a list is a very different creature from an array.
When you write
return %fileDetails;
This is equivalent to:
return ( 'something', 0, 'somethingelse', 7.68016712043654, 'else', 'burst' );
When you invoke the subroutine and get that list back, one thing you can do is assign it to a new hash:
my %result = fileDetailsSub();
That works because a hash can be initialized with a list of key-value pairs. (Remember that (foo => 42, bar => 43 ) is the same thing as ('foo', 42, 'bar', 43).
Now, when you use the backslash reference operator on a hash, as in \%fileDetails, you get a hash reference which is a scalar the points to a hash.
Similarly, if you write \#array, you get an array reference.
But when you use the reference operator on a list, you don't get a reference to a list (since lists are not variables (they are ephemeral), they can't be referenced.) Instead, the reference operator distributes over list items, so
\( 'foo', 'bar', 'baz' );
makes a new list:
( \'foo', \'bar', \'baz' );
(In this case we get a list full of scalar references.) And this is what you're seeing when you try to Dumper the results of your subroutine: a reference operator distributed over the list of items returned from your sub.
So, one solution is to assign the result list to an actual hash variable before using Dumper. Another is to return a hash reference (what you're Dumpering anyway) from the sub:
return \%fileDetails;
...
my $details_ref = fileDetailsSub();
print Dumper( $details_ref );
# access it like this:
my $elem = $details_ref->{something};
my %copy = %{ $details_ref };
For more fun, see:
perldoc perlreftut - the Perl reference tutorial, and
perldoc perlref - the Perl reference reference.
Why not return a reference to the hash instead?
return \%fileDetails;
As long as it is a lexical variable, it will not complicate things with other uses of the subroutine. I.e.:
sub fileDetails {
my %fileDetails;
... # assign stuff
return \%fileDetails;
}
When the execution leaves the subroutine, the variable goes out of scope, but the data contained in memory remains.
The reason the Dumper output looks like that is that you are feeding it a referenced list. Subroutines cannot return arrays or hashes, they can only return lists of scalars. What you are doing is something like this:
print Dumper \(qw(something 0 somethingelse 7.123 else burst));
Perl functions can not return hashes, only lists. A return %foo statement will flatten out %foo into a list and returns the flattened list. To get the return value to be interpreted as a hash, you can assign it to a named hash
%new_hash = fileDetailsSub(...);
print Dumper(\%new_hash);
or cast it (not sure if that is the best word for it) with a %{{...}} sequence of operations:
print Dumper( \%{ {fileDetailsSub(...)} } );
Another approach, as TLP points out, is to return a hash reference from your function.
You can't return a hash directly, but perl can automatically convert between hashes and lists as needed. So perl is converting that into a list, and you are capturing it as a list. i.e.
Dumper( filedetail() ) # list
my %fd = filedetail(); Dumper( \%fd ); #hash
In list context, Perl does not distinguish between a hash and a list of key/value pairs. That is, if a subroutine returns a hash, what it really returns is an list of (key1, value1, key2, value2...). Fortunately, that works both ways; if you take such a list and assign it to a hash, you get a faithful copy of the original:
my %fileDetailsCopy = subroutineName();
But if it wouldn't break other code, it would probably make more sense to have the sub return a reference to the hash instead, as TLP said.

assign a hash into a hash

I wish to assign a hash (returned by a method) into another hash, for a given key.
For e.g., a method returns a hash of this form:
hash1->{'a'} = 'a1';
hash1->{'b'} = 'b1';
Now, I wish to assign these hash values into another hash inside the calling method, to get something like:
hash2->{'1'}->{'a'} = 'a1';
hash2->{'1'}->{'b'} = 'b1';
Being new to perl, I'm not sure the best way to do this. But sounds trivial...
Your sub might be:
#!/usr/bin/env perl
use strict;
use warnings;
sub mystery
{
my($hashref) = { a => 'a1', b => 'b1' };
return $hashref;
}
my $hashref1 = mystery;
print "$hashref1->{a} and $hashref1->{b}\n";
my $hashref2 = { 1 => $hashref1 };
print "$hashref2->{1}->{a} and $hashref2->{1}->{b}\n";
One key point is that your notation for accessing the variables with the -> arrow operator is dealing with hash refs, not with plain hashes.
We have a 1st and a 2nd hash:
my %hash1 = (
a => 'a1',
b => 'b1');
my %hash2 = (1 => undef);
We can only assign scalar values to hashes, but this includes references. To take a reference, use the backslash operator:
$hash2{1} = \%hash1;
We can now dereference the values almost as in your example:
print $hash2{1}->{a}; # prints "a1"
Be carefull to use the correct sigil ($#%) as appropriate. Use the sigil of the data type you expect, wich is not neccessarily the type you declared.
"perldoc perlreftut" might be interesting.

Is %$var dereferencing a Perl hash?

I'm sending a subroutine a hash, and fetching it with my($arg_ref) = #_;
But what exactly is %$arg_ref? Is %$ dereferencing the hash?
$arg_ref is a scalar since it uses the $ sigil. Presumably, it holds a hash reference. So yes, %$arg_ref deferences that hash reference. Another way to write it is %{$arg_ref}. This makes the intent of the code a bit more clear, though more verbose.
To quote from perldata(1):
Scalar values are always named with '$', even when referring
to a scalar that is part of an array or a hash. The '$'
symbol works semantically like the English word "the" in
that it indicates a single value is expected.
$days # the simple scalar value "days"
$days[28] # the 29th element of array #days
$days{'Feb'} # the 'Feb' value from hash %days
$#days # the last index of array #days
So your example would be:
%$arg_ref # hash dereferenced from the value "arg_ref"
my($arg_ref) = #_; grabs the first item in the function's argument stack and places it in a local variable called $arg_ref. The caller is responsible for passing a hash reference. A more canonical way to write that is:
my $arg_ref = shift;
To create a hash reference you could start with a hash:
some_sub(\%hash);
Or you can create it with an anonymous hash reference:
some_sub({pi => 3.14, C => 4}); # Pi is a gross approximation.
Instead of dereferencing the entire hash like that, you can grab individual items with
$arg_ref->{key}
A good brief introduction to references (creating them and using them) in Perl is perldoc perfeftut. You can also read it online (or get it as a pdf). (It talks more about references in complex data structures than in terms of passing in and out of subroutines, but the syntax is the same.)
my %hash = ( fred => 'wilma',
barney => 'betty');
my $hashref = \%hash;
my $freds_wife = $hashref->{fred};
my %hash_copy = %$hash # or %{$hash} as noted above.
Soo, what's the point of the syntax flexibility? Let's try this:
my %flintstones = ( fred => { wife => 'wilma',
kids => ['pebbles'],
pets => ['dino'],
}
barney => { # etc ... }
);
Actually for deep data structures like this it's often more convenient to start with a ref:
my $flintstones = { fred => { wife => 'Wilma',
kids => ['Pebbles'],
pets => ['Dino'],
},
};
OK, so fred gets a new pet, 'Velociraptor'
push #{$flintstones->{fred}->{pets}}, 'Velociraptor';
How many pets does Fred have?
scalar # {flintstones->{fred}->{pets} }
Let's feed them ...
for my $pet ( # {flintstones->{fred}->{pets} } ) {
feed($pet)
}
and so on. The curly-bracket soup can look a bit daunting at first, but it becomes quite easy to deal with them in the end, so long as you're consistent in the way that you deal with them.
Since it's somewhat clear this construct is being used to provide a hash reference as a list of named arguments to a sub it should also be noted that this
sub foo {
my ($arg_ref) = #_;
# do something with $arg_ref->{keys}
}
may be overkill as opposed to just unpacking #_
sub bar {
my ($a, $b, $c) = #_;
return $c / ( $a * $b );
}
Depending on how complex the argument list is.