Why I can use #list to call an array, but can't use %dict to call a hash in perl? [duplicate] - perl

This question already has answers here:
Why do you need $ when accessing array and hash elements in Perl?
(9 answers)
Closed 8 years ago.
Today I start my perl journey, and now I'm exploring the data type.
My code looks like:
#list=(1,2,3,4,5);
%dict=(1,2,3,4,5);
print "$list[0]\n"; # using [ ] to wrap index
print "$dict{1}\n"; # using { } to wrap key
print "#list[2]\n";
print "%dict{2}\n";
it seems $ + var_name works for both array and hash, but # + var_name can be used to call an array, meanwhile % + var_name can't be used to call a hash.
Why?

#list[2] works because it is a slice of a list.
In Perl 5, a sigil indicates--in a non-technical sense--the context of your expression. Except from some of the non-standard behavior that slices have in a scalar context, the basic thought is that the sigil represents what you want to get out of the expression.
If you want a scalar out of a hash, it's $hash{key}.
If you want a scalar out of an array, it's $array[0]. However, Perl allows you to get slices of the aggregates. And that allows you to retrieve more than one value in a compact expression. Slices take a list of indexes. So,
#list = #hash{ qw<key1 key2> };
gives you a list of items from the hash. And,
#list2 = #list[0..3];
gives you the first four items from the array. --> For your case, #list[2] still has a "list" of indexes, it's just that list is the special case of a "list of one".
As scalar and list contexts were rather well defined, and there was no "hash context", it stayed pretty stable at $ for scalar and # for "lists" and until recently, Perl did not support addressing any variable with %. So neither %hash{#keys} nor %hash{key} had meaning. Now, however, you can dump out pairs of indexes with values by putting the % sigil on the front.
my %hash = qw<a 1 b 2>;
my #list = %hash{ qw<a b> }; # yields ( 'a', 1, 'b', 2 )
my #l2 = %list[0..2]; # yields ( 0, 'a', 1, '1', 2, 'b' )
So, I guess, if you have an older version of Perl, you can't, but if you have 5.20, you can.
But for a completist's sake, slices have a non-intuitive way that they work in a scalar context. Because the standard behavior of putting a list into a scalar context is to count the list, if a slice worked with that behavior:
( $item = #hash{ #keys } ) == scalar #keys;
Which would make the expression:
$item = #hash{ #keys };
no more valuable than:
scalar #keys;
So, Perl seems to treat it like the expression:
$s = ( $hash{$keys[0]}, $hash{$keys[1]}, ... , $hash{$keys[$#keys]} );
And when a comma-delimited list is evaluated in a scalar context, it assigns the last expression. So it really ends up that
$item = #hash{ #keys };
is no more valuable than:
$item = $hash{ $keys[-1] };
But it makes writing something like this:
$item = $hash{ source1(), source2(), #array3, $banana, ( map { "$_" } source4()};
slightly easier than writing:
$item = $hash{ [source1(), source2(), #array3, $banana, ( map { "$_" } source4()]->[-1] }
But only slightly.

Arrays are interpolated within double quotes, so you see the actual contents of the array printed.
On the other hand, %dict{1} works, but is not interpolated within double quotes. So, something like my %partial_dict = %dict{1,3} is valid and does what you expect i.e. %partial_dict will now have the value (1,2,3,4). But "%dict{1,3}" (in quotes) will still be printed as %dict{1,3}.
Perl Cookbook has some tips on printing hashes.

Related

Perl assigning #ARGV array to a variable

When I assign the Perl #ARGV array to a variable, if I don't use the quotes, it gives me the number of strings in the array, and not the strings in the array.
What is this called - I thought it was dereferencing, but it is not. Right now I am calling it one more thing I need to memorize in Perl.
#!/usr/bin/perl
use strict ;
use warnings;
my $str = "#ARGV" ;
#my $str = #ARGV ;
#my $str = 'geeks, for, geeks';
my #spl = split(', ' , $str);
foreach my $i (#spl) {
print "$i\n" ;
}
If you assign an array to a scalar in Perl, you get the number of elements in the array.
my #array = (1, 1, 2, 3, 5, 8, 13);
my $scalar = #array; # $scalar contains 7
This is known as "evaluating an array in scalar context".
If you expand an array in a double-quoted string in Perl, you get the elements of the array separated by spaces.
my #array = (1, 1, 2, 3, 5, 8, 13);
my $scalar = "#array"; $scalar contains '1 1 2 3 5 8 13'
This is known as "interpolating an array in a double-quoted string".
Actually, in that second example, the elements are separated by the current contents of the $" variable. And the default value of that variable is a space. But you can change it.
my #array = (1, 1, 2, 3, 5, 8, 13);
$" = '+';
my $scalar = "#array"; $scalar contains '1+1+2+3+5+8+13'
To store a reference to the array, you either take a reference to the array.
my $scalar = \#array;
Or create a new, anonymous array using the elements of of the original array.
my $scalar = [ #array ];
Because we don't know what you are actually trying to do, we can't recommend which of these is the best approach.
Perl works by context. The one that you see here is scalar versus array context. In scalar context, you want one thing, so Perl gives you the one thing the probably makes sense. Recognize the context and you can probably suss out what's going on.
When you have a scalar on the left side of an assignment, you have scalar context because you want to end up with one thing:
my $one_thing = ...
Put an array on the right side, and you have an array in scalar context. The design of Perl decided that the most common thing people probably want in that case is the number of elements:
my $one_thing = #array;
This works with some other builtins too. The localtime builtin returns a single string in scalar context (a timestamp):
my $uid = localtime; # Tue Mar 17 11:39:47 2020
But, in array context, you want possibly multiple things (where that could be two, or one, or zero, or ten thousand, or...). In that case, localtime returns a list of things:
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime();
You already know some of this though, probably. The + operator uses its operands as numbers, but the . operator uses them as strings:
my $sum = '123' + 14;
my $string = '123' . 14;
Perl's philosophy is that it is going to try to do what the verbs (operators, builtins, functions) are trying to do, not what the nouns (variable or value type) might imply. Many languages tell the verbs what to do based on the nouns, so fitting Perl into one of those mental modules usually doesn't work out. You don't have to memorize a lot; I've been doing this quite awhile and I still refer to the docs often.
We go through a lot of this philosophical explanation in Learning Perl.
The idiom you are looking for is one of
my $str = \#ARGV;
my $str = [ #ARGV ];
These both assign an array reference to the scalar variable $str. You can then get back the elements of #ARGV when you dereference $str. For example,
for my $i (#$str) {
print "$i\n";
}
(Some people prefer #{$str}, which does the same thing)
\ is the reference operator, which returns a reference to whatever is on its right hand side.
[...] creates a new array reference out of whatever is contained between the brackets.
"#array" is a stringify operation on an array, and equivalent to join($", #array)
And finally, a scalar assignment from an array, like
$n = #array
returns the number of elements in the array.
The total number of elements in the Array can sometimes be required. Since such situations are often encountered, we also have to learn how to get the total number.
#array = ("a".."z");
$re = $#array;
print ("$re\n");
We may need to add one to the number we get to reach the total number.
$ay = ("a", "b", "c");
$re = $#ay;
$re = $re +1;
print ("$re\n");
result: 3

Does iterating over a hash reference require implicitly copying it in perl?

Lets say I have a large hash and I want to iterate over the contents of it contents. The standard idiom would be something like this:
while(($key, $value) = each(%{$hash_ref})){
///do something
}
However, if I understand my perl correctly this is actually doing two things. First the
%{$hash_ref}
is translating the ref into list context. Thus returning something like
(key1, value1, key2, value2, key3, value3 etc)
which will be stored in my stacks memory. Then the each method will run, eating the first two values in memory (key1 & value1) and returning them to my while loop to process.
If my understanding of this is right that means that I have effectively copied my entire hash into my stacks memory only to iterate over the new copy, which could be expensive for a large hash, due to the expense of iterating over the array twice, but also due to potential cache hits if both hashes can't be held in memory at once. It seems pretty inefficient. I'm wondering if this is what really happens, or if I'm either misunderstanding the actual behavior or the compiler optimizes away the inefficiency for me?
Follow up questions, assuming I am correct about the standard behavior.
Is there a syntax to avoid copying of the hash by iterating over it values in the original hash? If not for a hash is there one for the simpler array?
Does this mean that in the above example I could get inconsistent values between the copy of my hash and my actual hash if I modify the hash_ref content within my loop; resulting in $value having a different value then $hash_ref->($key)?
No, the syntax you quote does not create a copy.
This expression:
%{$hash_ref}
is exactly equivalent to:
%$hash_ref
and assuming the $hash_ref scalar variable does indeed contain a reference to a hash, then adding the % on the front is simply 'dereferencing' the reference - i.e. it resolves to a value that represents the underlying hash (the thing that $hash_ref was pointing to).
If you look at the documentation for the each function, you'll see that it expects a hash as an argument. Putting the % on the front is how you provide a hash when what you have is a hashref.
If you wrote your own subroutine and passed a hash to it like this:
my_sub(%$hash_ref);
then on some level you could say that the hash had been 'copied', since inside the subroutine the special #_ array would contain a list of all the key/value pairs from the hash. However even in that case, the elements of #_ are actually aliases for the keys and values. You'd only actually get a copy if you did something like: my #args = #_.
Perl's builtin each function is declared with the prototype '+' which effectively coerces a hash (or array) argument into a reference to the underlying data structure.
As an aside, starting with version 5.14, the each function can also take a reference to a hash. So instead of:
($key, $value) = each(%{$hash_ref})
You can simply say:
($key, $value) = each($hash_ref)
No copy is created by each (though you do copy the returned values into $key and $value through assignment). The hash itself is passed to each.
each is a little special. It supports the following syntaxes:
each HASH
each ARRAY
As you can see, it doesn't accept an arbitrary expression. (That would be each EXPR or each LIST). The reason for that is to allow each(%foo) to pass the hash %foo itself to each rather than evaluating it in list context. each can do that because it's an operator, and operators can have their own parsing rules. However, you can do something similar with the \% prototype.
use Data::Dumper;
sub f { print(Dumper(#_)); }
sub g(\%) { print(Dumper(#_)); } # Similar to each
my %h = (a=>1, b=>2);
f(%h); # Evaluates %h in list context.
print("\n");
g(%h); # Passes a reference to %h.
Output:
$VAR1 = 'a'; # 4 args, the keys and values of the hash
$VAR2 = 1;
$VAR3 = 'b';
$VAR4 = 2;
$VAR1 = { # 1 arg, a reference to the hash
'a' => 1,
'b' => 2
};
%{$h_ref} is the same as %h, so all of the above applies to %{$h_ref} too.
Note that the hash isn't copied even if it is flattened. The keys are "copied", but the values are returned directly.
use Data::Dumper;
my %h = (abc=>"def", ghi=>"jkl");
print(Dumper(\%h));
$_ = uc($_) for %h;
print(Dumper(\%h));
Output:
$VAR1 = {
'abc' => 'def',
'ghi' => 'jkl'
};
$VAR1 = {
'abc' => 'DEF',
'ghi' => 'JKL'
};
You can read more about this here.

Test, if a hash key exists and print values of Hash in perl

If a key exists in a array, I want to print that key and its values from hash. Here is the code I wrote.
for($i=0;$i<#array.length;$i++)
{
if (exists $hash{$array[$i]})
{
print OUTPUT $array[$i],"\n";
}
}
From the above code, I am able to print keys. But I am not sure how to print values of that key.
Can someone help me?
Thanks
#array.length is syntactically legal, but it's definitely not what you want.
#array, in scalar context, gives you the number of elements in the array.
The length function, with no argument, gives you the length of $_.
The . operator performs string concatenation.
So #array.length takes the number of elements in #array and the length of the string contained in $_, treats them as strings, and joins them together. $i < ... imposes a numeric context, so it's likely to be treated as a number -- but surely not the one you want. (If #array has 15 elements and $_ happens to be 7 characters long, the number should be 157, a meaningless value.)
The right way to compute the number of elements in #array is just #array in scalar context -- or, to make it more explicit, scalar #array.
To answer your question, if $array[$i] is a key, the corresponding value is $hash{$array[$i]}.
But a C-style for loop is not the cleanest way to traverse an array, especially if you only need the value, not the index, on each iteration.
foreach my $elem (#array) {
if (exists $hash{$elem}) {
print OUTPUT "$elem\n";
}
}
Some alternative methods using hash slices:
foreach (#hash{#array}) { print OUTPUT "$_\n" if defined };
print OUTPUT join("\n",grep {defined} #hash{#array});
(For those who like golfing).

What perl code samples can lead to undefined behaviour?

These are the ones I'm aware of:
The behaviour of a "my" statement modified with a statement modifier conditional or loop construct (e.g. "my $x if ...").
Modifying a variable twice in the same statement, like $i = $i++;
sort() in scalar context
truncate(), when LENGTH is greater than the length of the file
Using 32-bit integers, "1 << 32" is undefined. Shifting by a negative number of bits is also undefined.
Non-scalar assignment to "state" variables, e.g. state #a = (1..3).
One that is easy to trip over is prematurely breaking out of a loop while iterating through a hash with each.
#!/usr/bin/perl
use strict;
use warnings;
my %name_to_num = ( one => 1, two => 2, three => 3 );
find_name(2); # works the first time
find_name(2); # but fails this time
exit;
sub find_name {
my($target) = #_;
while( my($name, $num) = each %name_to_num ) {
if($num == $target) {
print "The number $target is called '$name'\n";
return;
}
}
print "Unable to find a name for $target\n";
}
Output:
The number 2 is called 'two'
Unable to find a name for 2
This is obviously a silly example, but the point still stands - when iterating through a hash with each you should either never last or return out of the loop; or you should reset the iterator (with keys %hash) before each search.
These are just variations on the theme of modifying a structure that is being iterated over:
map, grep and sort where the code reference modifies the list of items to sort.
Another issue with sort arises where the code reference is not idempotent (in the comp sci sense)--sort_func($a, $b) must always return the same value for any given $a and $b.

What is the difference between the scalar and list contexts in Perl?

What is the difference between the scalar and list contexts in Perl and does this have any parallel in other languages such as Java or Javascript?
Various operators in Perl are context sensitive and produce different results in list and scalar context.
For example:
my(#array) = (1, 2, 4, 8, 16);
my($first) = #array;
my(#copy1) = #array;
my #copy2 = #array;
my $count = #array;
print "array: #array\n";
print "first: $first\n";
print "copy1: #copy1\n";
print "copy2: #copy2\n";
print "count: $count\n";
Output:
array: 1 2 4 8 16
first: 1
copy1: 1 2 4 8 16
copy2: 1 2 4 8 16
count: 5
Now:
$first contains 1 (the first element of the array), because the parentheses in the my($first) provide an array context, but there's only space for one value in $first.
both #copy1 and #copy2 contain a copy of #array,
and $count contains 5 because it is a scalar context, and #array evaluates to the number of elements in the array in a scalar context.
More elaborate examples could be constructed too (the results are an exercise for the reader):
my($item1, $item2, #rest) = #array;
my(#copy3, #copy4) = #array, #array;
There is no direct parallel to list and scalar context in other languages that I know of.
Scalar context is what you get when you're looking for a single value. List context is what you get when you're looking for multiple values. One of the most common places to see the distinction is when working with arrays:
#x = #array; # copy an array
$x = #array; # get the number of elements in an array
Other operators and functions are context sensitive as well:
$x = 'abc' =~ /(\w+)/; # $x = 1
($x) = 'abc' =~ /(\w+)/; # $x = 'abc'
#x = localtime(); # (seconds, minutes, hours...)
$x = localtime(); # 'Thu Dec 18 10:02:17 2008'
How an operator (or function) behaves in a given context is up to the operator. There are no general rules for how things are supposed to behave.
You can make your own subroutines context sensitive by using the wantarray function to determine the calling context. You can force an expression to be evaluated in scalar context by using the scalar keyword.
In addition to scalar and list contexts you'll also see "void" (no return value expected) and "boolean" (a true/false value expected) contexts mentioned in the documentation.
This simply means that a data-type will be evaluated based on the mode of the operation. For example, an assignment to a scalar means the right-side will be evaluated as a scalar.
I think the best means of understanding context is learning about wantarray. So imagine that = is a subroutine that implements wantarray:
sub = {
return if ( ! defined wantarray ); # void: just return (doesn't make sense for =)
return #_ if ( wantarray ); # list: return the array
return $#_ + 1; # scalar: return the count of the #_
}
The examples in this post work as if the above subroutine is called by passing the right-side as the parameter.
As for parallels in other languages, yes, I still maintain that virtually every language supports something similar. Polymorphism is similar in all OO languages. Another example, Java converts objects to String in certain contexts. And every untyped scripting language i've used has similar concepts.