Comparing two hashes in perl using ne - perl

I am trying to understand a piece of code in perl, but I am having some trouble with it being sort of new to perl programming.
I have two hashes, which are being input the same (key,value) pairs in the same order in different iterations of a for loop.
Iteration 1 creates %hash1, and Iteration 2 creates %hash2.
%hash1 = (1 => 10, 2 => 20, 3=> 30);
%hash2 = (1 => 10, 2 => 20, 3=> 30);
Then a command that compares these: goes as,
if (%hash1 ne %hash2) {print "Not Equal"; die;}
My question is:
(1) What exactly is compared in the above if statement?
(2) I tried assigning,
my $a = %hash1; my $b = %hash2;
But these give me outputs like 3/8!
What could that be?
Any help would be greatly appreciated.

ne is the string comparison operator. It's operands are strings, and thus scalars. From perldata,
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash.
So it's comparing that both hashes have the same number of used buckets and that both hashes have the same number of allocated buckets.
One way to compare the hashes would be to stringify them using JSON:XS with canonical set.
JSON::XS->new->canonical(1)->encode(\%hash)

There is a Module Data::Compare available for comparing hashes on CPAN. This works as follows:
use Data::Compare; # exports subroutine: Compare() !
...
my %hash1 = (1 => 10, 2 => 20, 3 => 30);
my %hash2 = (1 => 10, 2 => 20, 3 => 30);
# This won't work:
# if (%hash1 ne %hash2) {print "Not Equal"; die;}
# This works:
if( ! Compare(\%hash1, \%hash2) ) { print "Not Equal"; die; }
...
This is not a core module, you'll have to install it. It is also available under activeperl/windows (in their default repository).
Regards,
rbo

Related

How does Perl handle a hash assignment to a scalar variable? [duplicate]

Consider the following snippet:
use strict;
use warnings;
my %a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
);
my $c = %a;
print $c;
The result of this is 5/8 and I don't understand what this represents. I read somewhere that a number from this fraction looking result might represent the number of buckets from the hash, but clearly this is not the case.
Does anyone knows how a perl hash is evaluated in scalar context?
Edit
I added a few other hashes to print:
use strict;
use warnings;
use 5.010;
my %a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
);
my $c = %a;
say $c; # 5/8
%a = ( a => 1,
b => 21,
c => 'cucu',
br => 2,
cr => 'cucu',
dr => '321312321',
);
$c = %a;
say $c; # 4/8
%a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
drr => '32131232122',
);
$c = %a;
say $c; #6/8
So, you call a 'tuple' like a => 1 a bucket in the hash? in that case, why is the last hash still having 8 as a denominator when it has 9 'tuples' ?
Thank you all for your responses until now :)
[The OP is asking about the format of the string returned by a hash in scalar context before Perl 5.26. Since Perl 5.26, a hash in scalar context no longer returns a string in this format, returning the number of elements in the hash instead. If you need the value discussed here, you can use Hash::Util's bucket_ratio().]
A hash is an array of linked lists. A hashing function converts the key into a number which is used as the index of the array element ("bucket") into which to store the value. The linked list handles the case where more than one key hashes to the same index ("collision").
The denominator of the fraction is the total number of buckets.
The numerator of the fraction is the number of buckets which has one or more elements.
For hashes with the same number of elements, the higher the number, the better. The one that returns 6/8 has fewer collisions than the one that returns 4/8.
From perldoc perldata:
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true; more
precisely, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a
slash.
In your case, you have five values (1,2,''cucu',undef, and '321312321') that have been mapped to by eight keys (a,b,c,d,r,br,cr, and dr).
The behaviour has changed since Perl 5.25. See perldata for Perl 5.26:
Prior to Perl 5.25 the value returned was a string consisting of the
number of used buckets and the number of allocated buckets, separated by
a slash. This is pretty much useful only to find out whether Perl's
internal hashing algorithm is performing poorly on your data set. For
example, you stick 10,000 things in a hash, but evaluating %HASH in
scalar context reveals 1/16, which means only one out of sixteen
buckets has been touched, and presumably contains all 10,000 of your
items. This isn't supposed to happen.
As of Perl 5.25 the return was changed to be the count of keys in the
hash. If you need access to the old behavior you can use
Hash::Util::bucket_ratio() instead.
The number of used buckets starts out to be approximately the number of keys; allocated buckets is consistently the lowest power of 2 > the number of keys. 5 keys will return 5/8. Larger numbers of unique keys grow slower, such that a hash %h that is just the list (1..128), with 64 key/value pairs, somehow gets a scalar value of 50/128.
However, once the hash has allocated its buckets, they will remain allocated even if you shrink the hash. I just made a hash %h with 9 pairs, thus 9/16 scalar; then when I reassigned %h to have just one pair, its scalar value was 1/16.
This actually makes sense in that it lets you test the hash's size, like a scalar of a simple array does.
To focus too much on this fractional pattern (as an indicator for internal details of the hash), can be confusing. There is an aspect of the "scalar value" of a hash that is important for potentially every Perl program and that is, if it is considered true in Boolean context, see an example:
if (%h) {
print "Entries in hash:\n";
for my $k (sort keys %h) {
print "$k: $h{$k}\n";
}
}
In perldoc perldata, section Scalar-values, you can read that
[...] The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed.
and, some paragraphs later,
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true [...]

How to print specific key in an array (Perl) [duplicate]

This question already has answers here:
Simple hash search by value
(5 answers)
Closed 5 years ago.
I recently started learning Perl, so I'm not too familiar with the functions and syntax.
If I have a Perl array and some variables,
#!/usr/bin/perl
use strict;
use warnings;
my #numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
my $x;
my $range = 5;
$x = int(rand($range));
print "$x";
to generate a random number between 1-5, how can I get the program to print the actual key (a, b, c, etc.) instead of just the number (1, 2, 3, 4, 5)?
It seems that you want to do a reverse lookup, key-by-value, opposite to what we get from a hash. Since a hash is a list you can reverse it and use the resulting hash to look up by number.
A couple of corrections: you need a hash variable (not an array), and you need to add 1 to your rand integer generator so to have the desired 1..5 range
use warnings;
use strict;
use feature 'say';
my %numbers = (a => 1, b => 2, c => 3, d => 4, e => 5);
my %lookup_by_number = reverse %numbers; # values need be unique
my $range = 5;
my $x = int(rand $range) + 1;
say $lookup_by_number{$x};
Without reversing the hash you'd need to iterate the hash %numbers over values, testing each against $x so to find its key.
If there are same values for various keys in your original hash then you have to do it by hand since reverse-ing would attempt to create a hash with duplicate keys, in which case only the last one assigned remains. So you'd lose some values. One way
my #at_num = grep { $x == $numbers{$_} } keys %numbers;
as in the post that this was marked as duplicate of.
But then you should build a data structure for reverse lookup so to not search through the list every time information is needed. This can be a hash where keys are the list of unique numbers while their values are then array references (arrayrefs) with corresponding keys from the original hash
use warnings;
use strict;
my %num = (a => 1, b => 2, c => 1, d => 3, e => 2); # with duplicate values
my %lookup_by_num;
foreach my $key (keys %num) {
push #{ $lookup_by_num{$num{$key}} }, $key;
}
say "$_ => [ #{$lookup_by_num{$_}} ]" for keys %lookup_by_num;
This prints
1 => [ c a ]
3 => [ d ]
2 => [ e b ]
A nice way to display complex data structures is via Data::Dumper, or Data::Dump (or others).
The expression #{ $lookup_by_num{ $num{$key} } } extracts the value of %lookup_by_num for the key $num{$key}and dereferences it #{ ... }, so that it can then push the $key to it. The critical part of this is that the first time it encounters $num{$key} it autovivifies the arrayref and its corresponding key. See this post with its references for details.
There's many ways to do it. For example, declare "numbers" as a hash rather than an array. Note that the keys come first in each key-value pair, and here you want to use your random int as the key:
my %numbers = ( 0 => 'a', 1 => 'b', 2 => 'c', 3 => 'd', 4 => 'e' );
Then you can look up the "key" as you call it using:
my $key = $numbers{$x};
Note that rand( $x ); returns a number greater than or equal to zero and less than $x. So if you want integers in the range 1-5, you must add 1 in your code: at the moment you'll get 0-4, not 1-5.
Firstly, arrays don't have keys (well, they kind of do, but they're integers and not the values you want). So I think you want a hash, not an array.
my %numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
And if you want to get the letter, given the integer then you need the reverse of this hash:
my %rev_numbers = %numbers;
Note that reversing a hash like this only works if the values in your original hash are unique (because reversing a hash makes the values into keys and hash keys are always unique).
Then, you can just look up an integer in your %rev_hash to get its associated letter.
my $integer = 3;
say $rev_numbers{$integer}; # prints 'c'

Perl: Access hash of dynamic depth

I am struggling with accessing/ modifying hashes of unknown (i.e. dynamic) depth.
Suppose I am reading in a table of measurements (Length, Width, Height) from a file, then calculating Area and Volume to create a hash like the following:
# #Length Width Height Results
my %results = (
'2' => {
'3' => {
'7' => {
'Area' => 6,
'Volume' => 42,
},
},
},
'6' => {
'4' => {
'2' => {
'Area' => 24,
'Volume' => 48,
},
},
},
);
I understand how to access a single item in the hash, e.g. $results{2}{3}{7}{'Area'} would give me 6, or I could check if that combination of measurements has been found in the input file with exists $results{2}{3}{7}{'Area'}. However that notation with the series of {} braces assumes I know when writing the code that there will be 4 layers of keys.
What if there are more or less and I only discover that at runtime? E.g. if there were only Length and Width in the file, how would you make code that would then access the hash like $results{2}{3}{'Area'}?
I.e. given a hash and dynamic-length list of nested keys that may or may not have a resultant entry in that hash, how do you access the hash for basic things like checking if that key combo has a value or modifying the value?
I almost want a notation like:
my #hashkeys = (2,3,7);
if exists ( $hash{join("->",#hashkeys)} ){
print "Found it!\n";
}
I know you can access sub-hashes of a hash and get their references so in this last example I could iterate through #hashkeys, checking for each one if the current hash has a sub-hash at that key and if so, saving a reference to that sub-hash for the next iteration. However, that feels complex and I suspect there is already a way to do this much easier.
Hopefully this is enough to understand my question but I can try to work up a MWE if not.
Thanks.
So here's a recursive function which does more or less what you want:
sub fetch {
my $ref = shift;
my $key = shift;
my #remaining_path = #_;
return undef unless ref $ref;
return undef unless defined $ref->{$key};
return $ref->{$key} unless scalar #remaining_path;
return fetch($ref->{$key}, #remaining_path);
}
fetch(\%results, 2, 3, 7, 'Volume'); # 42
fetch(\%results, 2, 3); # hashref
fetch(\%results, 2, 3, 7, 'Area', 8); # undef
fetch(\%results, 2, 3, 8, 'Area'); # undef
But please check the comment about bad data structure which is already given by someone else, because it's very true. And if you still think that this is what you need, at least rewrite it using a for-loop, as perl does not optimize tail recursion.
Take a look at $; in "man perlvar".
http://perldoc.perl.org/perlvar.html#%24%3b
You may use the idea to convert variable length array into single key.
my %foo;
my (#KEYS)=(2,3,7);
$foo{ join( $; , #KEYS ) }{Area}=6;
$foo{ join( $; , #KEYS ) }{Volume}=42;

Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate]

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?
#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.
Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

How to alter the hash values with new values in perl?

How can i reset the hash values in perl
use warnings;
use strict;
my %hash = qw(one 1 two 2 three 3 four 4);
my #key = keys(%hash);
my #avz = (9..12);
my %vzm;
print "Original hash and keys : ",%hash,"\n";
for(my $i = 0; $i<=scalar #avz; $i++){
my #new = "$key[$i] $avz[$i] ";
push(%vzm , #new);
}
print "modified hash and keys",%vzm,"\n";
I tried to alter the keys of original hash with another keys. How can i do it
This program give the error is:
Original hash and keys : three3one1two2four4
Not an ARRAY reference at key.pl line 10.
I expect the output is
Original hash and keys : three3one1two2four4
modified hash and keys : three11one9two10four12
How can i do it
Ok, first off - you're doing something nasty in your code:
You're trying to take an ordered data structure - an array - and push it into a keyed data structure, which has no particular ordering defined.
This isn't going to work very well - it technically works, because internally perl treats arrays and hashes similarly.
But for example your first assignment - what you're actually getting is:
my %hash = (
one => 1,
two => 2,
three => 3,
four => 4
);
You can access the keys (in no particular order) via keys(). And the values via values(). But to try and treat it like an array is undefined behaviour.
To add elements to your array:
$hash{'nine'} = 9;
To delete elements from your array:
delete ( $hash{'one'} );
You can iterate on keys or values - and combined with sort even do them in some sort of order. (Just bear in mind for sorting alphanumeric numbers you'll have a custom sort job).
foreach my $key ( sort keys %hash ) {
print "$key => $hash{$key}\n";
}
(Note - this is sorting by alphanumeric string, so gives:
four => 4
one => 1
three => 3
two => 2
If you want to sort by value:
foreach my $key ( sort { $hash{$a} <=> $hash{$b} } keys %hash ) {
print "$key => $hash{$key}\n";
}
And so you'll get:
one => 1
two => 2
three => 3
four => 4
So the real question remains - what are you actually trying to accomplish? The point of a hash is to give you an unordered mini-database of key-value pairs. Treating one like an array doesn't make an awful lot of sense. Either you're iterating hash elements in arbitrary order, or you're applying a specific sort to it - but one where you're relying on getting elements in a particular order is a bad plan - it may work, but it's not guaranteed to work, and that makes for bad code.
You have to keep the order of the keys in some array, or take it from original list
my #tmp = qw(one 1 two 2 three 3 four 4);
my %hash = #tmp;
# 'one', 'two', ..
my #key = #tmp[ grep !($_%2), 0 .. $#tmp ];
# ..
for my $i (0 .. $#avz) {
$vzm{ $key[$i] } = $avz[$i];
}
or using hash slice as more perlish approach,
#vzm{ #key } = #avz;
You can't do what you want (replace the values for keys in the hash in the order they originally were added) without keeping track of that order separately, since the hash doesn't have any particular order. In other words, this:
my #key = keys(%hash);
needs to be this:
my #key = ( 'one', 'two', 'three', 'four' );
Once you have that, you can just assign the values all at once with a hash slice:
my %vzm;
#vzm{#key} = #avz;
To create a hash element, you use assignment to $var{$key}.
for (my $i = 0; $i < scalar #avz; $i++) {
$vzm{$key[$i]} = $avz[$i];
}
Note also that the loop condition should be <, not <=. List/array indexes end at scalar #avz - 1.