Why do you need $ when accessing array and hash elements in Perl? - perl

Since arrays and hashes can only contain scalars in Perl, why do you have to use the $ to tell the interpreter that the value is a scalar when accessing array or hash elements? In other words, assuming you have an array #myarray and a hash %myhash, why do you need to do:
$x = $myarray[1];
$y = $myhash{'foo'};
instead of just doing :
$x = myarray[1];
$y = myhash{'foo'};
Why are the above ambiguous?
Wouldn't it be illegal Perl code if it was anything but a $ in that place? For example, aren't all of the following illegal in Perl?
#var[0];
#var{'key'};
%var[0];
%var{'key'};

I've just used
my $x = myarray[1];
in a program and, to my surprise, here's what happened when I ran it:
$ perl foo.pl
Flying Butt Monkeys!
That's because the whole program looks like this:
$ cat foo.pl
#!/usr/bin/env perl
use strict;
use warnings;
sub myarray {
print "Flying Butt Monkeys!\n";
}
my $x = myarray[1];
So myarray calls a subroutine passing it a reference to an anonymous array containing a single element, 1.
That's another reason you need the sigil on an array access.

Slices aren't illegal:
#slice = #myarray[1, 2, 5];
#slice = #myhash{qw/foo bar baz/};
And I suspect that's part of the reason why you need to specify if you want to get a single value out of the hash/array or not.

The sigil give you the return type of the container. So if something starts with #, you know that it returns a list. If it starts with $, it returns a scalar.
Now if there is only an identifier after the sigil (like $foo or #foo, then it's a simple variable access. If it's followed by a [, it is an access on an array, if it's followed by a {, it's an access on a hash.
# variables
$foo
#foo
# accesses
$stuff{blubb} # accesses %stuff, returns a scalar
#stuff{#list} # accesses %stuff, returns an array
$stuff[blubb] # accesses #stuff, returns a scalar
# (and calls the blubb() function)
#stuff[blubb] # accesses #stuff, returns an array
Some human languages have very similar concepts.
However many programmers found that confusing, so Perl 6 uses an invariant sigil.
In general the Perl 5 compiler wants to know at compile time if something is in list or in scalar context, so without the leading sigil some terms would become ambiguous.

This is valid Perl: #var[0]. It is an array slice of length one. #var[0,1] would be an array slice of length two.
#var['key'] is not valid Perl because arrays can only be indexed by numbers, and
the other two (%var[0] and %var['key']) are not valid Perl because hash slices use the {} to index the hash.
#var{'key'} and #var{0} are both valid hash slices, though. Obviously it isn't normal to take slices of length one, but it is certainly valid.
See the slice section of perldata perldocfor more information about slicing in Perl.

People have already pointed out that you can have slices and contexts, but sigils are there to separate the things that are variables from everything else. You don't have to know all of the keywords or subroutine names to choose a sensible variable name. It's one of the big things I miss about Perl in other languages.

I can think of one way that
$x = myarray[1];
is ambiguous - what if you wanted a array called m?
$x = m[1];
How can you tell that apart from a regex match?
In other words, the syntax is there to help the Perl interpreter, well, interpret!

In Perl 5 (to be changed in Perl 6) a sigil indicates the context of your expression.
You want a particular scalar out of a hash so it's $hash{key}.
You want the value of a particular slot out of an array, so it's $array[0].
However, as pointed out by zigdon, slices are legal. They interpret the those expressions in a list context.
You want a lists of 1 value in a hash #hash{key} works
But also larger lists work as well, like #hash{qw<key1 key2 ... key_n>}.
You want a couple of slots out of an array #array[0,3,5..7,$n..$n+5] works
#array[0] is a list of size 1.
There is no "hash context", so neither %hash{#keys} nor %hash{key} has meaning.
So you have "#" + "array[0]" <=> < sigil = context > + < indexing expression > as the complete expression.

The sigil provides the context for the access:
$ means scalar context (a scalar
variable or a single element of a hash or an array)
# means list context (a whole array or a slice of
a hash or an array)
% is an entire hash

In Perl 5 you need the sigils ($ and #) because the default interpretation of bareword identifier is that of a subroutine call (thus eliminating the need to use & in most cases ).

Related

Perl assign array elements to hash user defined key

Below is code in which I need help.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #arrayElements = ('Array Functions');
print join(", ", #arrayElements);
### Output => Array Functions
my %hashElements = ();
I want to assign the content of #arrayElements to $hashElements{Item}
Missing some core concepts or trying wrong and been a while struggling with this.
You seem to be missing some core concepts of Perl (or programming in general). If you are learning Perl through a book or online tutorial, I suggest you re-read the chapters on arrays and hashes.
Let's look at the things involved here. You have:
#arrayElements, which is an array. It contains a list with one elements, the string 'Array Functions'.
%hashElements, which is a hash. It's empty.
$hashElements{Item}, which is a scalar value. You want to set this.
You say you want $hashElements{Item} to have the value 'Array Functions', which you have as the first element in your array #arrayElements.
$hashElements{Item} = $arrayElements[0];
And that's it. Both $hashElements{Item} and $arrayElements[0] are scalar values. That's why their sigils (the sign at the front) changes from an # (for array) or % (for hash) to a $. You can distinguish whether the value came from a hash or an array by the brackets used to access the elements. [] is for arrays, and {} is for hashes.
You cannot do the following though.
$hashElements{Item} = #arrayElements;
Because $hashElements{Item} is a scalar, the thing on the right hand side of the assignment will be treated in scalar context. An array in scalar context gets converted to the number of elements in the array, so this would assign 1. That's not what you want.
You should really read up more about this, and also pick better names for your variables. Your example is very confusing. In general, we don't do $CamelCase for variable names in Perl, but instead use $snake_case, which is easier to read and type.
Take a look at the following resources to learn more about the concepts I've mentioned above.
Perl Maven, perldata, perldsc

How to write two expressions as just one?

Can I write these two:
$var = tied $$var; # History=HASH(0x192a540)
$var->{ desc }; # object description info
By one expression:
${tied $$var}->{ desc };
I get the error:
Not a SCALAR reference at ...
The syntax SOMETHING->{key} tries to perform hash lookup in a reference SOMETHING. Here, your SOMETHING is ${...}, i.e. a scalar dereference.
Instead, you want
normal parentheses: (...)->{key}
hash access without an extra level of dereferencing: ${...}{key}.
The -> dereference operator is only optional between two subscripts. I.e. $foo{bar}[42] and $foo{bar}->[42] are equivalent and access a value from the %foo hash. But $foo->{bar}[42] is completely different: This accesses a value in the $foo hash reference.
The syntax %{SOMETHING}{key} is not correct because that dereferences SOMETHING as a hash, then access an entry. But the syntax for accessing an entry in a hash %SOMETHING is $SOMETHING{key}, not %SOMETHING{key}. The sigil % of a hash turns into a scalar sigil $ because you get a scalar entry out of the hash. This is known to be confusing, and has been fixed in Perl 6.

Storing a hash in a hash

I'm having troubles with a Perl script. I try to store a hash in a hash. The script is trivial:
use Data::Dumper;
my %h1=();
$h1{name}="parent";
my %h2=();
$h2{name}="child";
$h1{nested}=%h2; # store hash h2 in hash h1
print "h2:\n";
print Dumper(%h2); # works
print "h1{nested}:\n";
print Dumper($h1{nested}); # fails
The results:
h2:
$VAR1 = 'name';
$VAR2 = 'child';
h1{nested}:
$VAR1 = '1/8';
Why is the $h1{nested} not dumped as a hash, but as some kind of weird scalar (1/8)?
PS: even if this question sounds trivial - I searched SO but did not find that it was asked before.
PPS: my Perl is v5.10.1 (*) built for x86_64-linux-gnu-thread-multi
(with 53 registered patches, see perl -V for more detail)
You can only store a hashref in a hash:
$h1{nested}=\%h2;
and then you would access %h2's name by doing
$h1{nested}->{name}
In your code, %h2 is forced to scalar context, which shows you that "1/8" value, and stores that.
In perl the values stored in a list (hash or array) are always scalars. Given this, the only way to store a hash inside another hash is to store a reference to it.
$h1{'nested'} = \%h2;
or also
$h1{'nested'} = { 'name'=>'child' };
(the braces in the right hand side is a reference to an anonymous hash).
BTW, to not quote the literals in the keys is usually considered bad practice, see here
Why is the $h1{nested} not dumped as a hash, but as some kind of weird scalar (1/8)?
Because you're storing it in a scalar context!
When you do this:
$h1{nested} = %h2;
You're storing a scalar. Since %h2 is a hash, you're given the ol' fraction string. According to the Perldoc website
If you evaluate a hash in scalar context, it returns false if the hash is empty. If there are any key/value pairs, it returns true; more precisely, the value returned is a string consisting of the number of used buckets and the number of allocated buckets, separated by a slash.
That explains the 1/8 you're getting.
What you need to do is store the hash as a reference in the other hash. As others pointed out, it should be:
$h1{nested} = \%h2;
The backslash before the hash's name gives you the memory location where the hash is stored. You can use the curly braces, but I prefer the backslash notation.
Take a look at perldoc prelreftut on your computer (or on the webpage I've linked to). This will tell you how to make such things as a list of lists, hashes or hashes, lists of hashes, and hashes of lists. Just a word o` warning: If you get too complex, it'll be hard to maintain, so once you've had your fun, take a look at perldoc's Perl Object Orientation Programming Tutorial.
The perldoc command contains lots of Perl documentation including for all Perl function, Perl modules installed on your system, and even basic information about the Perl language.

Are Perl subroutines call-by-reference or call-by-value?

I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.

Why does my Perl print show HASH(0x100a2d018)?

Here I am thinking I know how to use lists in Perl, when this happens. If I do this (debugging code, prettiness not included):
#! /usr/bin/perl -w
use strict;
my $temp1 = "FOOBAR";
my $temp2 = "BARFOO!";
my #list = { $temp1, $temp2 };
print $temp1; #this works fine
print $list[0]; #this prints out HASH(0x100a2d018)
It looks like I am printing out the address of the second string. How do I get at the actual string stored inside the list? I assume it has something to do with references, but dunno for sure.
my #list = { $temp1, $temp2 };
should be
my #list = ( $temp1, $temp2 ); # Parentheses instead of curly braces.
What your original code did was store a reference to a hash {$temp1 => $temp2} into #list's first element ($list[0]). This is a perfectly valid thing to do (which is why you didn't get a syntax error), it's just not what you intended to do.
You already got the answer to your question, don't use {}, because that creates an anonymous hash reference.
However, there is still the matter of the question you didn't know you asked.
What is the difference between an array and a list in Perl?
In your question, you use the term 'list' as if it were interchangeable with the term array, but the terms are not interchangeable. It is important to understand the what the difference is.
An array is a type of variable. You can assign values to it. You can take references to it.
A list is an ordered group of zero or more scalars that is created when an expression is evaluated in a list context.
Say what?
Ok, conisder the case of my $foo = (1,2,3). Here $foo is a scalar, and so the expression (1,2,3) is evaluated in a scalar context.
On the surface it is easy to look at (1,2,3) and say that's a literal list. But it is not.
It is a group of literal values strung together using the comma operator. In a scalar context, the comma operator returns the right hand value, so we really have ((1 ,2),3) which becomes ((2),3) and finally 3.
Now my #foo = (1,2,3) is very different. Assignment into an array occurs in a list context, so we evaluate (1,2,3) in list context. Here the comma operator inserts both sides into the list. So we have ((1,2),3) which evaluates to (list_of(1,2),3) and then list_of(list_of(1,2),3), since Perl flattens lists, this becomes list_of(1,2,3). The resulting list is assigned into #foo. Note that there is no list_of operator in Perl, I am trying to differentiate between what is commonly thought of as a literal list and an actual list. Since there is no way to directly express an actual list in Perl, I had to make one up.
So, what does all this mean to someone who is just learning Perl? I'll boil it down to a couple of key points:
Learn about and pay attention to context.
Remember that your array variables are arrays and not lists.
Don't worry too much if this stuff seems confusing.
DWIM does, mostly--most of the time the right things will happen without worrying about the details.
While you are pondering issues of context, you might want to look at these links:
Start with the discussion of context in Programming Perl. Larry et alia explain it all much more clearly than I do.
Perlop means something entirely different when you pay attention to what each operator returns based on context.
A nice discussion of scalar and context on Perlmonks.
An short introductory article about context: Context is Everything.
MJD explains context.
The perldoc for scalar and wantarray