How does Perl handle a hash assignment to a scalar variable? [duplicate] - perl

Consider the following snippet:
use strict;
use warnings;
my %a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
);
my $c = %a;
print $c;
The result of this is 5/8 and I don't understand what this represents. I read somewhere that a number from this fraction looking result might represent the number of buckets from the hash, but clearly this is not the case.
Does anyone knows how a perl hash is evaluated in scalar context?
Edit
I added a few other hashes to print:
use strict;
use warnings;
use 5.010;
my %a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
);
my $c = %a;
say $c; # 5/8
%a = ( a => 1,
b => 21,
c => 'cucu',
br => 2,
cr => 'cucu',
dr => '321312321',
);
$c = %a;
say $c; # 4/8
%a = ( a => 1,
b => 2,
c => 'cucu',
d => undef,
r => 1,
br => 2,
cr => 'cucu',
dr => '321312321',
drr => '32131232122',
);
$c = %a;
say $c; #6/8
So, you call a 'tuple' like a => 1 a bucket in the hash? in that case, why is the last hash still having 8 as a denominator when it has 9 'tuples' ?
Thank you all for your responses until now :)

[The OP is asking about the format of the string returned by a hash in scalar context before Perl 5.26. Since Perl 5.26, a hash in scalar context no longer returns a string in this format, returning the number of elements in the hash instead. If you need the value discussed here, you can use Hash::Util's bucket_ratio().]
A hash is an array of linked lists. A hashing function converts the key into a number which is used as the index of the array element ("bucket") into which to store the value. The linked list handles the case where more than one key hashes to the same index ("collision").
The denominator of the fraction is the total number of buckets.
The numerator of the fraction is the number of buckets which has one or more elements.
For hashes with the same number of elements, the higher the number, the better. The one that returns 6/8 has fewer collisions than the one that returns 4/8.

From perldoc perldata:
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true; more
precisely, the value returned is a string consisting of the number of
used buckets and the number of allocated buckets, separated by a
slash.
In your case, you have five values (1,2,''cucu',undef, and '321312321') that have been mapped to by eight keys (a,b,c,d,r,br,cr, and dr).

The behaviour has changed since Perl 5.25. See perldata for Perl 5.26:
Prior to Perl 5.25 the value returned was a string consisting of the
number of used buckets and the number of allocated buckets, separated by
a slash. This is pretty much useful only to find out whether Perl's
internal hashing algorithm is performing poorly on your data set. For
example, you stick 10,000 things in a hash, but evaluating %HASH in
scalar context reveals 1/16, which means only one out of sixteen
buckets has been touched, and presumably contains all 10,000 of your
items. This isn't supposed to happen.
As of Perl 5.25 the return was changed to be the count of keys in the
hash. If you need access to the old behavior you can use
Hash::Util::bucket_ratio() instead.

The number of used buckets starts out to be approximately the number of keys; allocated buckets is consistently the lowest power of 2 > the number of keys. 5 keys will return 5/8. Larger numbers of unique keys grow slower, such that a hash %h that is just the list (1..128), with 64 key/value pairs, somehow gets a scalar value of 50/128.
However, once the hash has allocated its buckets, they will remain allocated even if you shrink the hash. I just made a hash %h with 9 pairs, thus 9/16 scalar; then when I reassigned %h to have just one pair, its scalar value was 1/16.
This actually makes sense in that it lets you test the hash's size, like a scalar of a simple array does.

To focus too much on this fractional pattern (as an indicator for internal details of the hash), can be confusing. There is an aspect of the "scalar value" of a hash that is important for potentially every Perl program and that is, if it is considered true in Boolean context, see an example:
if (%h) {
print "Entries in hash:\n";
for my $k (sort keys %h) {
print "$k: $h{$k}\n";
}
}
In perldoc perldata, section Scalar-values, you can read that
[...] The Boolean context is just a special kind of scalar context where no conversion to a string or a number is ever performed.
and, some paragraphs later,
If you evaluate a hash in scalar context, it returns false if the hash
is empty. If there are any key/value pairs, it returns true [...]

Related

perl - usage of $ symbol

I know $, # and % are for declaring scalars, arrays and hashes. But I am confused when when $ is used in declaration of other things like
my %myhash1 = ( a => 1, b => 2 );
my $myhash2 = { A => 27, B => 27};
and usage of following syntax
%myhash1
%$myhash2
Can someone please explain me the difference and when to use them.
No, the sigils refer to the mode of access. $ means a single element, # means multiple elements and % means key/value pairs. Demo:
no references involved
$myhash1{a} # 1
#myhash1{qw(a b)} # (1, 2)
%myhash1{a} # (a => 1)
with references involved
$$myhash2{A} # 27
#$myhash2{qw(A B)} # (27, 27)
%$myhash2{A} # (A => 27)
which is just short-hand for
${ $myhash2 }{A} # 27
#{ $myhash2 }{qw(A B)} # (27, 27)
%{ $myhash2 }{A} # (A => 27)
which is much more clearly written using the -> deref operator
$myhash2->{A} # 27
$myhash2->#{qw(A B)} # (27, 27)
$myhash2->%{A} # (A => 27)
Prefer the notation shown in the last block.
Lack of subscripting braces:
%myhash1 means: all k/v pairs in the hash.
Any of %$myhash2, %{ $myhash2 }, $myhash2->%* means: all k/v pairs in the dereferenced hash.
Perl 5 sigils (the $, #, and %) can be confusing. Here's a condensed explanation. (The Perl 6 rules are different and I'm skipping them here, and as #ikegami notes, there are some other features, like hash and array slices, that I'm also skipping for concision.)
A dollar sign sigil always means "I am dealing with a scalar value". So the variable $foo is a variable containing a single scalar value (there are cases as noted where a scalars actually contains multiple different things that are accessed in different contexts, but this is not important for what we're interested in right now).
This is true when you're accessing array and hash elements: $foo[0] says "I am accessing the scalar value found in element 0 of the array #foo. For a hash, $foo{'bar'} says "I am accessing the scalar value found in the element whose key is bar in the hash %foo.
An at-sign sigil denotes an array as a whole. #foo means "all of the array whose name is foo". (Skipping slices here; they're a way to get a list of things out of a hash or array at once. Really useful, but they'll obscure our point right now.)
Similarly, a percent sign denotes a hash as a whole. %foo is the hash named foo -- all of it at once. (Same for hash slices.)
These are all the base data types in Perl: scalars ($), arrays (#), and hashes (%). There's also the sub (& - function) type, which we'll talk more about below, and globs (* - which are beyond the scope of this answer, so we'll skip them too).
If we want more complex, nested data structures, we now have to move on to references.
A reference is, to wildly oversimplify, essentially a pointer that knows the kind of thing it points to. References are always scalars. If we want a reference to something, there are, as one would expect in Perl 5, several ways to get one:
Use the backslash (\) operator to specifically get a reference. We can do this with any of the three types of variables (scalar, hash, array) and with subroutines as well (remember, subroutines have a sigil of &). This is the only way to get a reference to a scalar variable.
Use one of the anonymous constructors to build a reference to a hash or array.
Construct a subroutine (function) reference in-line with an anonymous subroutine.
Here's examples of each:
# Define a reference to a scalar
$foo = \$bar;
# Define a reference to an array, a hash and an existing subroutine:
$x = \#w;
$y = \%zorch;
$z = \&do_stuff;
# define anonymous arrays, hashes, and subs:
$my_array = [2, 3, 'fred', 'hoplite'];
$my_hash = {foo => 'bar', baz => 'quux'};
$pi_sub = sub { return 3.14 };
There are three different ways to "dereference" (follow the pointer in) a reference.
Use stacked sigils. These read from right to left; each sigil dereferences the variable again. This is most often used to dereference a scalar. You'll usually only see two, but sometimes multiple levels of dereferencing are done at once to go through a chain of scalar values. Stacked sigils can be confusing, mostly because they add a right-to-left interpretation to a language that is read primarily left-to-right.
Use a sigil and paired braces. This is the logical equivalent of using parens to disambiguate expressions. What's inside the braces is dereferenced as the type outside the braces. Most often used to get aggregate types from a reference.
Use a chained series of dereference operators. Each dereference implies that the lefthand side is a scalar reference, so several in a row implies that the result of each dereference is another (scalar) reference. Handy for digging down into a structure to find something.
Here are examples:
# Dereference a scalar using stacked sigils.
print $$foo; # $foo is a reference to the scalar $bar
# the second $ says "please give me the value of the scalar
$ I now have" (in this case that's $bar).
# Use braces and sigils to dereference a scalar, getting a hash/array
%hash = ( a => 1, b => 2);
my %thing = %{ $my_hash }; # dereference the scalar
# copy the resulting hash into %thing
# Chained dereferences
# A hash of hashes:
$sample = { one => { a => 1, b => 2}, two => { aa => 4, bb => 8 }};
# To get to 8, we need to dereference twice:
$eight = $sample->{two}->{bb};
# Note that we can collapse the chain after the first deref.
$same_result = $sample->{two}{bb}
# If we wanted to get an element from $my_hash, we don't have to
# dereference it and construct the new hash. We can just use the
# dereference operator.
$x = $my_hash->{a} == $thing{a}; # $x is 1 (true).
Any combination of nestings is permitted, so we could have a hash of arrays of hashes of subs, for instance.
Subs are a special case in that all we can do with them is copy their references, or call them:
$pi_times = sub { my $x = shift; return 3.14159 * $x };
$the_sub = $pi_times;
$z = $the_sub->(4); # calls the $pi_times anonymous sub
# returns pi * 4
The perldoc perlre man page goes into this in detail; perldoc perlreftut is a good tutorial which follows up what I've outlined here in a lot more detail.

Hash keys and values in the right order

I've seen many times the following piece of code to join a hash to another hash
%hash1 = ('one' => "uno");
%hash2 = ('two' => "dos", 'three' => "tres");
#hash1{keys %hash2} = values %hash2;
I thought that every time the "values" or "keys" function is called, their output order was random. If this is true, how does the statement above gets the keys and values in the right order on both sides?
In other words, why there is not a chance of getting 'two' => 'tres' in %hash1 after merging both hashes?
Is Perl smart enough to know that if "keys" and "values" are called on the same line, then keys and values must be given in the same order?
See perldoc -f keys
So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other.
A hash is an array of linked lists. A hashing function converts the key into a number which is used as the index of the array element ("bucket") into which to store the value. More than one key can hash to the same index ("collision"), a situation handled by the linked lists.
The iterator used by keys, values and each returns the elements in a order consistent with their location in the hash. I imagine it iterates over the linked list in the first bucket, then over the linked list in the second bucket, etc. The points is that it doesn't randomize the order in which it iterates over the elements of the hash. That's why the docs guarantee the following:
So long as a given hash is unmodified you may rely on keys, values and each to repeatedly return the same order as each other.
What is "random"[1] is in which bucket number to which a key will hash. Each hash has a random secret number that perturbs the hashing function. This causes the order of the elements in a hash to be different for each hash and for each run of a program.[2]
Adding elements to a hash can cause the number of buckets to increase, and it can cause trigger the secret number to change (if one of the linked lists becomes abnormally long). Both of these will change the order of the elements in that hash.
$ perl -le'
my %h1 = map { $_ => 1 } "a".."j";
my %h2 = map { $_ => 1 } "a".."j";
print keys(%$_) for \%h1, \%h1, \%h2, \%h2;
'
hjfeadbigc
hjfeadbigc
bdgcifjhae
bdgcifjhae
$ perl -le'
my %h1 = map { $_ => 1 } "a".."j";
my %h2 = map { $_ => 1 } "a".."j";
print keys(%$_) for \%h1, \%h1, \%h2, \%h2;
'
dcahigjbfe
dcahigjbfe
gihacdefbj
gihacdefbj
It's not quite random. If you insert two elements in a hash, the second element has a greater than 50% chance of being returned after the first by the iterator.
In older versions of Perl, things weren't quite as random.

How to print specific key in an array (Perl) [duplicate]

This question already has answers here:
Simple hash search by value
(5 answers)
Closed 5 years ago.
I recently started learning Perl, so I'm not too familiar with the functions and syntax.
If I have a Perl array and some variables,
#!/usr/bin/perl
use strict;
use warnings;
my #numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
my $x;
my $range = 5;
$x = int(rand($range));
print "$x";
to generate a random number between 1-5, how can I get the program to print the actual key (a, b, c, etc.) instead of just the number (1, 2, 3, 4, 5)?
It seems that you want to do a reverse lookup, key-by-value, opposite to what we get from a hash. Since a hash is a list you can reverse it and use the resulting hash to look up by number.
A couple of corrections: you need a hash variable (not an array), and you need to add 1 to your rand integer generator so to have the desired 1..5 range
use warnings;
use strict;
use feature 'say';
my %numbers = (a => 1, b => 2, c => 3, d => 4, e => 5);
my %lookup_by_number = reverse %numbers; # values need be unique
my $range = 5;
my $x = int(rand $range) + 1;
say $lookup_by_number{$x};
Without reversing the hash you'd need to iterate the hash %numbers over values, testing each against $x so to find its key.
If there are same values for various keys in your original hash then you have to do it by hand since reverse-ing would attempt to create a hash with duplicate keys, in which case only the last one assigned remains. So you'd lose some values. One way
my #at_num = grep { $x == $numbers{$_} } keys %numbers;
as in the post that this was marked as duplicate of.
But then you should build a data structure for reverse lookup so to not search through the list every time information is needed. This can be a hash where keys are the list of unique numbers while their values are then array references (arrayrefs) with corresponding keys from the original hash
use warnings;
use strict;
my %num = (a => 1, b => 2, c => 1, d => 3, e => 2); # with duplicate values
my %lookup_by_num;
foreach my $key (keys %num) {
push #{ $lookup_by_num{$num{$key}} }, $key;
}
say "$_ => [ #{$lookup_by_num{$_}} ]" for keys %lookup_by_num;
This prints
1 => [ c a ]
3 => [ d ]
2 => [ e b ]
A nice way to display complex data structures is via Data::Dumper, or Data::Dump (or others).
The expression #{ $lookup_by_num{ $num{$key} } } extracts the value of %lookup_by_num for the key $num{$key}and dereferences it #{ ... }, so that it can then push the $key to it. The critical part of this is that the first time it encounters $num{$key} it autovivifies the arrayref and its corresponding key. See this post with its references for details.
There's many ways to do it. For example, declare "numbers" as a hash rather than an array. Note that the keys come first in each key-value pair, and here you want to use your random int as the key:
my %numbers = ( 0 => 'a', 1 => 'b', 2 => 'c', 3 => 'd', 4 => 'e' );
Then you can look up the "key" as you call it using:
my $key = $numbers{$x};
Note that rand( $x ); returns a number greater than or equal to zero and less than $x. So if you want integers in the range 1-5, you must add 1 in your code: at the moment you'll get 0-4, not 1-5.
Firstly, arrays don't have keys (well, they kind of do, but they're integers and not the values you want). So I think you want a hash, not an array.
my %numbers = (a =>1, b=> 2, c => 3, d =>4, e => 5);
And if you want to get the letter, given the integer then you need the reverse of this hash:
my %rev_numbers = %numbers;
Note that reversing a hash like this only works if the values in your original hash are unique (because reversing a hash makes the values into keys and hash keys are always unique).
Then, you can just look up an integer in your %rev_hash to get its associated letter.
my $integer = 3;
say $rev_numbers{$integer}; # prints 'c'

Comparing two hashes in perl using ne

I am trying to understand a piece of code in perl, but I am having some trouble with it being sort of new to perl programming.
I have two hashes, which are being input the same (key,value) pairs in the same order in different iterations of a for loop.
Iteration 1 creates %hash1, and Iteration 2 creates %hash2.
%hash1 = (1 => 10, 2 => 20, 3=> 30);
%hash2 = (1 => 10, 2 => 20, 3=> 30);
Then a command that compares these: goes as,
if (%hash1 ne %hash2) {print "Not Equal"; die;}
My question is:
(1) What exactly is compared in the above if statement?
(2) I tried assigning,
my $a = %hash1; my $b = %hash2;
But these give me outputs like 3/8!
What could that be?
Any help would be greatly appreciated.
ne is the string comparison operator. It's operands are strings, and thus scalars. From perldata,
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash.
So it's comparing that both hashes have the same number of used buckets and that both hashes have the same number of allocated buckets.
One way to compare the hashes would be to stringify them using JSON:XS with canonical set.
JSON::XS->new->canonical(1)->encode(\%hash)
There is a Module Data::Compare available for comparing hashes on CPAN. This works as follows:
use Data::Compare; # exports subroutine: Compare() !
...
my %hash1 = (1 => 10, 2 => 20, 3 => 30);
my %hash2 = (1 => 10, 2 => 20, 3 => 30);
# This won't work:
# if (%hash1 ne %hash2) {print "Not Equal"; die;}
# This works:
if( ! Compare(\%hash1, \%hash2) ) { print "Not Equal"; die; }
...
This is not a core module, you'll have to install it. It is also available under activeperl/windows (in their default repository).
Regards,
rbo

Perl Hash Slice, Replication x Operator, and sub params

Ok, I understand perl hash slices, and the "x" operator in Perl, but can someone explain the following code example from here (slightly simplified)?
sub test{
my %hash;
#hash{#_} = (undef) x #_;
}
Example Call to sub:
test('one', 'two', 'three');
This line is what throws me:
#hash{#_} = (undef) x #_;
It is creating a hash where the keys are the parameters to the sub and initializing to undef, so:
%hash:
'one' => undef,
'two' => undef,
'three' => undef
The rvalue of the x operator should be a number; how is it that #_ is interpreted as the length of the sub's parameter array? I would expect you'd at least have to do this:
#hash{#_} = (undef) x scalar #_;
To figure out this code you need to understand three things:
The repetition operator. The x operator is the repetition operator. In list context, if the operator's left-hand argument is enclosed in parentheses, it will repeat the items in a list:
my #x = ('foo') x 3; # ('foo', 'foo', 'foo')
Arrays in scalar context. When an array is used in scalar context, it returns its size. The x operator imposes scalar context on its right-hand argument.
my #y = (7,8,9);
my $n = 10 * #y; # $n is 30
Hash slices. The hash slice syntax provides a way to access multiple hash items at once. A hash slice can retrieve hash values, or it can be assigned to. In the case at hand, we are assigning to a hash slice.
# Right side creates a list of repeated undef values -- the size of #_.
# We assign that list to a set of hash keys -- also provided by #_.
#hash{#_} = (undef) x #_;
Less obscure ways to do the same thing:
#hash{#_} = ();
$hash{$_} = undef for #_;
In scalar context, an array evaluates to its length. From perldoc perldata:
If you evaluate an array in scalar context, it returns the length of the array. (Note that this is not true of lists, which return the last value, like the C comma operator, nor of built-in functions, which return whatever they feel like returning.)
Although I cannot find more information on it currently, it seems that the replication operator evaluates its second argument in scalar context, causing the array to evaluate to its length.