What is the difference between $this, #that, and %those in Perl?
A useful mnemonic for Perl sigils are:
$calar
#rray
%ash
Matt Trout wrote a great comment on blog.fogus.me about Perl sigils which I think is useful so have pasted below:
Actually, perl sigils don’t denote variable type – they denote conjugation – $ is ‘the’, # is
‘these’, % is ‘map of’ or so – variable type is denoted via [] or {}. You can see this with:
my $foo = 'foo';
my #foo = ('zero', 'one', 'two');
my $second_foo = $foo[1];
my #first_and_third_foos = #foo[0,2];
my %foo = (key1 => 'value1', key2 => 'value2', key3 => 'value3');
my $key2_foo = $foo{key2};
my ($key1_foo, $key3_foo) = #foo{'key1','key3'};
so looking at the sigil when skimming perl code tells you what you’re going to -get- rather
than what you’re operating on, pretty much.
This is, admittedly, really confusing until you get used to it, but once you -are- used to it
it can be an extremely useful tool for absorbing information while skimming code.
You’re still perfectly entitled to hate it, of course, but it’s an interesting concept and I
figure you might prefer to hate what’s -actually- going on rather than what you thought was
going on :)
$this is a scalar value, it holds 1 item like apple
#that is an array of values, it holds several like ("apple", "orange", "pear")
%those is a hash of values, it holds key value pairs like ("apple" => "red", "orange" => "orange", "pear" => "yellow")
See perlintro for more on Perl variable types.
Perl's inventor was a linguist, and he sought to make Perl like a "natural language".
From this post:
Disambiguation by number, case and word order
Part of the reason a language can get away with certain local ambiguities is that other ambiguities are suppressed by various mechanisms. English uses number and word order, with vestiges of a case system in the pronouns: "The man looked at the men, and they looked back at him." It's perfectly clear in that sentence who is doing what to whom. Similarly, Perl has number markers on its nouns; that is, $dog is one pooch, and #dog is (potentially) many. So $ and # are a little like "this" and "these" in English. [emphasis added]
People often try to tie sigils to variable types, but they are only loosely related. It's a topic we hit very hard in Learning Perl and Effective Perl Programming because it's much easier to understand Perl when you understand sigils.
Many people forget that variables and data are actually separate things. Variables can store data, but you don't need variables to use data.
The $ denotes a single scalar value (not necessarily a scalar variable):
$scalar_var
$array[1]
$hash{key}
The # denotes multiple values. That could be the array as a whole, a slice, or a dereference:
#array;
#array[1,2]
#hash{qw(key1 key2)}
#{ func_returning_array_ref };
The % denotes pairs (keys and values), which might be a hash variable or a dereference:
%hash
%$hash_ref
Under Perl v5.20, the % can now denote a key/value slice or either a hash or array:
%array[ #indices ]; # returns pairs of indices and elements
%hash{ #keys }; # returns pairs of key-values for those keys
You might want to look at the perlintro and perlsyn documents in order to really get started with understanding Perl (i.e., Read The Flipping Manual). :-)
That said:
$this is a scalar, which can store a number (int or float), a string, or a reference (see below);
#that is an array, which can store an ordered list of scalars (see above). You can add a scalar to an array with the push or unshift functions (see perlfunc), and you can use a parentheses-bounded comma-separated list of scalar literals or variables to create an array literal (i.e., my #array = ($a, $b, 6, "seven");)
%those is a hash, which is an associative array. Hashes have key-value pairs of entries, such that you can access the value of a hash by supplying its key. Hash literals can also be specified much like lists, except that every odd entry is a key and every even one is a value. You can also use a => character instead of a comma to separate a key and a value. (i.e., my %ordinals = ("one" => "first", "two" => "second");)
Normally, when you pass arrays or hashes to subroutine calls, the individual lists are flattened into one long list. This is sometimes desirable, sometimes not. In the latter case, you can use references to pass a reference to an entire list as a single scalar argument. The syntax and semantics of references are tricky, though, and fall beyond the scope of this answer. If you want to check it out, though, see perlref.
Related
Below is code in which I need help.
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my #arrayElements = ('Array Functions');
print join(", ", #arrayElements);
### Output => Array Functions
my %hashElements = ();
I want to assign the content of #arrayElements to $hashElements{Item}
Missing some core concepts or trying wrong and been a while struggling with this.
You seem to be missing some core concepts of Perl (or programming in general). If you are learning Perl through a book or online tutorial, I suggest you re-read the chapters on arrays and hashes.
Let's look at the things involved here. You have:
#arrayElements, which is an array. It contains a list with one elements, the string 'Array Functions'.
%hashElements, which is a hash. It's empty.
$hashElements{Item}, which is a scalar value. You want to set this.
You say you want $hashElements{Item} to have the value 'Array Functions', which you have as the first element in your array #arrayElements.
$hashElements{Item} = $arrayElements[0];
And that's it. Both $hashElements{Item} and $arrayElements[0] are scalar values. That's why their sigils (the sign at the front) changes from an # (for array) or % (for hash) to a $. You can distinguish whether the value came from a hash or an array by the brackets used to access the elements. [] is for arrays, and {} is for hashes.
You cannot do the following though.
$hashElements{Item} = #arrayElements;
Because $hashElements{Item} is a scalar, the thing on the right hand side of the assignment will be treated in scalar context. An array in scalar context gets converted to the number of elements in the array, so this would assign 1. That's not what you want.
You should really read up more about this, and also pick better names for your variables. Your example is very confusing. In general, we don't do $CamelCase for variable names in Perl, but instead use $snake_case, which is easier to read and type.
Take a look at the following resources to learn more about the concepts I've mentioned above.
Perl Maven, perldata, perldsc
I would like to properly understand hashes in Perl. I've had to use Perl intermittently for quite some time and mostly whenever I need to do it, it's mostly related to text processing.
And everytime, I have to deal with hashes, it gets messed up. I find the syntax very cryptic for hashes
A good explanation of hashes and hash references, their differences, when they are required etc. would be much appreciated.
A simple hash is close to an array. Their initializations even look similar. First the array:
#last_name = (
"Ward", "Cleaver",
"Fred", "Flintstone",
"Archie", "Bunker"
);
Now let's represent the same information with a hash (aka associative array):
%last_name = (
"Ward", "Cleaver",
"Fred", "Flintstone",
"Archie", "Bunker"
);
Although they have the same name, the array #last_name and the hash %last_name are completely independent.
With the array, if we want to know Archie's last name, we have to perform a linear search:
my $lname;
for (my $i = 0; $i < #last_name; $i += 2) {
$lname = $last_name[$i+1] if $last_name[$i] eq "Archie";
}
print "Archie $lname\n";
With the hash, it's much more direct syntactically:
print "Archie $last_name{Archie}\n";
Say we want to represent information with only slightly richer structure:
Cleaver (last name)
Ward (first name)
June (spouse's first name)
Flintstone
Fred
Wilma
Bunker
Archie
Edith
Before references came along, flat key-value hashes were about the best we could do, but references allow
my %personal_info = (
"Cleaver", {
"FIRST", "Ward",
"SPOUSE", "June",
},
"Flintstone", {
"FIRST", "Fred",
"SPOUSE", "Wilma",
},
"Bunker", {
"FIRST", "Archie",
"SPOUSE", "Edith",
},
);
Internally, the keys and values of %personal_info are all scalars, but the values are a special kind of scalar: hash references, created with {}. The references allow us to simulate "multi-dimensional" hashes. For example, we can get to Wilma via
$personal_info{Flintstone}->{SPOUSE}
Note that Perl allows us to omit arrows between subscripts, so the above is equivalent to
$personal_info{Flintstone}{SPOUSE}
That's a lot of typing if you want to know more about Fred, so you might grab a reference as sort of a cursor:
$fred = $personal_info{Flintstone};
print "Fred's wife is $fred->{SPOUSE}\n";
Because $fred in the snippet above is a hashref, the arrow is necessary. If you leave it out but wisely enabled use strict to help you catch these sorts of errors, the compiler will complain:
Global symbol "%fred" requires explicit package name at ...
Perl references are similar to pointers in C and C++, but they can never be null. Pointers in C and C++ require dereferencing and so do references in Perl.
C and C++ function parameters have pass-by-value semantics: they're just copies, so modifications don't get back to the caller. If you want to see the changes, you have to pass a pointer. You can get this effect with references in Perl:
sub add_barney {
my($personal_info) = #_;
$personal_info->{Rubble} = {
FIRST => "Barney",
SPOUSE => "Betty",
};
}
add_barney \%personal_info;
Without the backslash, add_barney would have gotten a copy that's thrown away as soon as the sub returns.
Note also the use of the "fat comma" (=>) above. It autoquotes the string on its left and makes hash initializations less syntactically noisy.
The following demonstrates how you can use a hash and a hash reference:
my %hash = (
toy => 'aeroplane',
colour => 'blue',
);
print "I have an ", $hash{toy}, " which is coloured ", $hash{colour}, "\n";
my $hashref = \%hash;
print "I have an ", $hashref->{toy}, " which is coloured ", $hashref->{colour}, "\n";
Also see perldoc perldsc.
A hash is a basic data type in Perl.
It uses keys to access its contents.
A hash ref is an abbreviation to a
reference to a hash. References are
scalars, that is simple values. It is
a scalar value that contains
essentially, a pointer to the actual
hash itself.
Link: difference between hash and hash ref in perl - Ubuntu Forums
A difference is also in the syntax for deleting. Like C, perl works like this for Hashes:
delete $hash{$key};
and for Hash References
delete $hash_ref->{$key};
The Perl Hash Howto is a great resource to understand Hashes versus Hash with Hash References
There is also another link here that has more information on perl and references.
See perldoc perlreftut which is also accessible on your own computer's command line.
A reference is a scalar value that refers to an entire array or an entire hash (or to just about anything else). Names are one kind of reference that you're already familiar with. Think of the President of the United States: a messy, inconvenient bag of blood and bones. But to talk about him, or to represent him in a computer program, all you need is the easy, convenient scalar string "Barack Obama".
References in Perl are like names for arrays and hashes. They're Perl's private, internal names, so you can be sure they're unambiguous. Unlike "Barack Obama", a reference only refers to one thing, and you always know what it refers to. If you have a reference to an array, you can recover the entire array from it. If you have a reference to a hash, you can recover the entire hash. But the reference is still an easy, compact scalar value.
I'm trying to figure out Perl subroutines and how they work.
From perlsub I understand that subroutines are call-by-reference and that an assignment (like my(#copy) = #_;) is needed to turn them into call-by-value.
In the following, I see that change is called-by-reference because "a" and "b" are changed into "x" and "y". But I'm confused about why the array isn't extended with an extra element "z"?
use strict;
use Data::Dumper;
my #a = ( "a" ,"b" );
change(#a);
print Dumper(\#a);
sub change
{
#_[0] = "x";
#_[1] = "y";
#_[2] = "z";
}
Output:
$VAR1 = [
'x',
'y'
];
In the following, I pass a hash instead of an array. Why isn't the key changed from "a" to "x"?
use strict;
use Data::Dumper;
my %a = ( "a" => "b" );
change(%a);
print Dumper(\%a);
sub change
{
#_[0] = "x";
#_[1] = "y";
}
Output:
$VAR1 = {
'a' => 'y'
};
I know the real solution is to pass the array or hash by reference using \#, but I'd like to understand the behaviour of these programs exactly.
Perl always passes by reference. It's just that sometimes the caller passes temporary scalars.
The first thing you have to realise is that the arguments of subs can be one and only one thing: a list of scalars.* One cannot pass arrays or hashes to them. Arrays and hashes are evaluated, returning a list of their content. That means that
f(#a)
is the same** as
f($a[0], $a[1], $a[2])
Perl passes by reference. Specifically, Perl aliases each of the arguments to the elements of #_. Modifying the elements #_ will change the scalars returned by $a[0], etc. and thus will modify the elements of #a.
The second thing of importance is that the key of an array or hash element determines where the element is stored in the structure. Otherwise, $a[4] and $h{k} would require looking at each element of the array or hash to find the desired value. This means that the keys aren't modifiable. Moving a value requires creating a new element with the new key and deleting the element at the old key.
As such, whenever you get the keys of an array or hash, you get a copy of the keys. Fresh scalars, so to speak.
Back to the question,
f(%h)
is the same** as
f(
my $k1 = "a", $h{a},
my $k2 = "b", $h{b},
my $k2 = "c", $h{c},
)
#_ is still aliased to the values returned by %h, but some of those are just temporary scalars used to hold a key. Changing those will have no lasting effect.
* — Some built-ins (e.g. grep) are more like flow control statements (e.g. while). They have their own parsing rules, and thus aren't limited to the conventional model of a sub.
** — Prototypes can affect how the argument list is evaluated, but it will still result in a list of scalars.
Perl's subroutines accept parameters as flat lists of scalars. An array passed as a parameter is for all practical purposes a flat list too. Even a hash is treated as a flat list of one key followed by one value, followed by one key, etc.
A flat list is not passed as a reference unless you do so explicitly. The fact that modifying $_[0] modifies $a[0] is because the elements of #_ become aliases for the elements passed as parameters. Modifying $_[0] is the same as modifying $a[0] in your example. But while this is approximately similar to the common notion of "pass by reference" as it applies to any programming language, this isn't specifically passing a Perl reference; Perl's references are different (and indeed "reference" is an overloaded term). An alias (in Perl) is a synonym for something, where as a reference is similar to a pointer to something.
As perlsyn states, if you assign to #_ as a whole, you break its alias status. Also note, if you try to modify $_[0], and $_[0] happens to be a literal instead of a variable, you'll get an error. On the other hand, modifying $_[0] does modify the caller's value if it is modifiable. So in example one, changing $_[0] and $_[1] propagates back to #a because each element of #_ is an alias for each element in #a.
Your second example is a little tricky. Hash keys are immutable. Perl doesn't provide a way to modify a hash key, aside from deleting it. That means that $_[0] is not modifiable. When you attempt to modify $_[0] Perl cannot comply with that request. It probably ought to throw a warning, but doesn't. You see, the flat list passed to it consists of unmodifiable-key followed by modifiable-value, etc. This is mostly a non-issue. I cannot think of any reason to modify individual elements of a hash in the way you're demonstrating; since hashes have no particular order you wouldn't have simple control over which elements in #_ propagate back to which values in %a.
As you pointed out, the proper protocol is to pass \#a or \%a, so that they can be referred to as $_[0]->{element} or $_[0]->[0]. Even though the notation is a little more complicated, it becomes second nature after awhile, and is much clearer (in my opinion) as to what is going on.
Be sure to have a look at the perlsub documentation. In particular:
Any arguments passed in show up in the array #_. Therefore, if you called a function with two arguments, those would be stored in $_[0] and $_[1]. The array #_ is a local array, but its elements are aliases for the actual scalar parameters. In particular, if an element $_[0] is updated, the corresponding argument is updated (or an error occurs if it is not updatable). If an argument is an array or hash element which did not exist when the function was called, that element is created only when (and if) it is modified or a reference to it is taken. (Some earlier versions of Perl created the element whether or not the element was assigned to.) Assigning to the whole array #_ removes that aliasing, and does not update any arguments.
(Note that use warnings is even more important than use strict.)
#_ itself isn't a reference to anything, it is an array (really, just a view of the stack, though if you do something like take a reference to it, it morphs into a real array) whose elements each are an alias to a passed parameter. And those passed parameters are the individual scalars passed; there is no concept of passing an array or hash (though you can pass a reference to one).
So shifts, splices, additional elements added, etc. to #_ don't affect anything passed, though they may change the index of or remove from the array one of the original aliases.
So where you call change(#a), this puts two aliases on the stack, one to $a[0] and one to $a[1]. change(%a) is more complicated; %a flattens out into an alternating list of keys and values, where the values are the actual hash values and modifying them modifies what's stored in the hash, but where the keys are merely copies, no longer associated with the hash.
Perl does not pass the array or hash itself by reference, it unfurls the entries (the array elements, or the hash keys and values) into a list and passes this list to the function. #_ then allows you to access the scalars as references.
This is roughly the same as writing:
#a = (1, 2, 3);
$b = \$a[2];
${$b} = 4;
#a now [1, 2, 4];
You'll note that in the first case you were not able to add an extra item to #a, all that happened was that you modified the members of #a that already existed. In the second case, the hash keys don't really exist in the hash as scalars, so these need to be created as copies in temporary scalars when the expanded list of the hash is created to be passed into the function. Modifying this temporary scalar will not modify the hash key, as it is not the hash key.
If you want to modify an array or hash in a function, you will need to pass a reference to the container:
change(\%foo);
sub change {
$_[0]->{a} = 1;
}
Firstly, you are confusing the # sigil as indicating an array. This is actually a list. When you call Change(#a) you are passing the list to the function, not an array object.
The case with the hash is slightly different. Perl evaluates your call into a list and passes the values as a list instead.
I would like to properly understand hashes in Perl. I've had to use Perl intermittently for quite some time and mostly whenever I need to do it, it's mostly related to text processing.
And everytime, I have to deal with hashes, it gets messed up. I find the syntax very cryptic for hashes
A good explanation of hashes and hash references, their differences, when they are required etc. would be much appreciated.
A simple hash is close to an array. Their initializations even look similar. First the array:
#last_name = (
"Ward", "Cleaver",
"Fred", "Flintstone",
"Archie", "Bunker"
);
Now let's represent the same information with a hash (aka associative array):
%last_name = (
"Ward", "Cleaver",
"Fred", "Flintstone",
"Archie", "Bunker"
);
Although they have the same name, the array #last_name and the hash %last_name are completely independent.
With the array, if we want to know Archie's last name, we have to perform a linear search:
my $lname;
for (my $i = 0; $i < #last_name; $i += 2) {
$lname = $last_name[$i+1] if $last_name[$i] eq "Archie";
}
print "Archie $lname\n";
With the hash, it's much more direct syntactically:
print "Archie $last_name{Archie}\n";
Say we want to represent information with only slightly richer structure:
Cleaver (last name)
Ward (first name)
June (spouse's first name)
Flintstone
Fred
Wilma
Bunker
Archie
Edith
Before references came along, flat key-value hashes were about the best we could do, but references allow
my %personal_info = (
"Cleaver", {
"FIRST", "Ward",
"SPOUSE", "June",
},
"Flintstone", {
"FIRST", "Fred",
"SPOUSE", "Wilma",
},
"Bunker", {
"FIRST", "Archie",
"SPOUSE", "Edith",
},
);
Internally, the keys and values of %personal_info are all scalars, but the values are a special kind of scalar: hash references, created with {}. The references allow us to simulate "multi-dimensional" hashes. For example, we can get to Wilma via
$personal_info{Flintstone}->{SPOUSE}
Note that Perl allows us to omit arrows between subscripts, so the above is equivalent to
$personal_info{Flintstone}{SPOUSE}
That's a lot of typing if you want to know more about Fred, so you might grab a reference as sort of a cursor:
$fred = $personal_info{Flintstone};
print "Fred's wife is $fred->{SPOUSE}\n";
Because $fred in the snippet above is a hashref, the arrow is necessary. If you leave it out but wisely enabled use strict to help you catch these sorts of errors, the compiler will complain:
Global symbol "%fred" requires explicit package name at ...
Perl references are similar to pointers in C and C++, but they can never be null. Pointers in C and C++ require dereferencing and so do references in Perl.
C and C++ function parameters have pass-by-value semantics: they're just copies, so modifications don't get back to the caller. If you want to see the changes, you have to pass a pointer. You can get this effect with references in Perl:
sub add_barney {
my($personal_info) = #_;
$personal_info->{Rubble} = {
FIRST => "Barney",
SPOUSE => "Betty",
};
}
add_barney \%personal_info;
Without the backslash, add_barney would have gotten a copy that's thrown away as soon as the sub returns.
Note also the use of the "fat comma" (=>) above. It autoquotes the string on its left and makes hash initializations less syntactically noisy.
The following demonstrates how you can use a hash and a hash reference:
my %hash = (
toy => 'aeroplane',
colour => 'blue',
);
print "I have an ", $hash{toy}, " which is coloured ", $hash{colour}, "\n";
my $hashref = \%hash;
print "I have an ", $hashref->{toy}, " which is coloured ", $hashref->{colour}, "\n";
Also see perldoc perldsc.
A hash is a basic data type in Perl.
It uses keys to access its contents.
A hash ref is an abbreviation to a
reference to a hash. References are
scalars, that is simple values. It is
a scalar value that contains
essentially, a pointer to the actual
hash itself.
Link: difference between hash and hash ref in perl - Ubuntu Forums
A difference is also in the syntax for deleting. Like C, perl works like this for Hashes:
delete $hash{$key};
and for Hash References
delete $hash_ref->{$key};
The Perl Hash Howto is a great resource to understand Hashes versus Hash with Hash References
There is also another link here that has more information on perl and references.
See perldoc perlreftut which is also accessible on your own computer's command line.
A reference is a scalar value that refers to an entire array or an entire hash (or to just about anything else). Names are one kind of reference that you're already familiar with. Think of the President of the United States: a messy, inconvenient bag of blood and bones. But to talk about him, or to represent him in a computer program, all you need is the easy, convenient scalar string "Barack Obama".
References in Perl are like names for arrays and hashes. They're Perl's private, internal names, so you can be sure they're unambiguous. Unlike "Barack Obama", a reference only refers to one thing, and you always know what it refers to. If you have a reference to an array, you can recover the entire array from it. If you have a reference to a hash, you can recover the entire hash. But the reference is still an easy, compact scalar value.
Here I am thinking I know how to use lists in Perl, when this happens. If I do this (debugging code, prettiness not included):
#! /usr/bin/perl -w
use strict;
my $temp1 = "FOOBAR";
my $temp2 = "BARFOO!";
my #list = { $temp1, $temp2 };
print $temp1; #this works fine
print $list[0]; #this prints out HASH(0x100a2d018)
It looks like I am printing out the address of the second string. How do I get at the actual string stored inside the list? I assume it has something to do with references, but dunno for sure.
my #list = { $temp1, $temp2 };
should be
my #list = ( $temp1, $temp2 ); # Parentheses instead of curly braces.
What your original code did was store a reference to a hash {$temp1 => $temp2} into #list's first element ($list[0]). This is a perfectly valid thing to do (which is why you didn't get a syntax error), it's just not what you intended to do.
You already got the answer to your question, don't use {}, because that creates an anonymous hash reference.
However, there is still the matter of the question you didn't know you asked.
What is the difference between an array and a list in Perl?
In your question, you use the term 'list' as if it were interchangeable with the term array, but the terms are not interchangeable. It is important to understand the what the difference is.
An array is a type of variable. You can assign values to it. You can take references to it.
A list is an ordered group of zero or more scalars that is created when an expression is evaluated in a list context.
Say what?
Ok, conisder the case of my $foo = (1,2,3). Here $foo is a scalar, and so the expression (1,2,3) is evaluated in a scalar context.
On the surface it is easy to look at (1,2,3) and say that's a literal list. But it is not.
It is a group of literal values strung together using the comma operator. In a scalar context, the comma operator returns the right hand value, so we really have ((1 ,2),3) which becomes ((2),3) and finally 3.
Now my #foo = (1,2,3) is very different. Assignment into an array occurs in a list context, so we evaluate (1,2,3) in list context. Here the comma operator inserts both sides into the list. So we have ((1,2),3) which evaluates to (list_of(1,2),3) and then list_of(list_of(1,2),3), since Perl flattens lists, this becomes list_of(1,2,3). The resulting list is assigned into #foo. Note that there is no list_of operator in Perl, I am trying to differentiate between what is commonly thought of as a literal list and an actual list. Since there is no way to directly express an actual list in Perl, I had to make one up.
So, what does all this mean to someone who is just learning Perl? I'll boil it down to a couple of key points:
Learn about and pay attention to context.
Remember that your array variables are arrays and not lists.
Don't worry too much if this stuff seems confusing.
DWIM does, mostly--most of the time the right things will happen without worrying about the details.
While you are pondering issues of context, you might want to look at these links:
Start with the discussion of context in Programming Perl. Larry et alia explain it all much more clearly than I do.
Perlop means something entirely different when you pay attention to what each operator returns based on context.
A nice discussion of scalar and context on Perlmonks.
An short introductory article about context: Context is Everything.
MJD explains context.
The perldoc for scalar and wantarray