What does the % mean in Perl? - perl

It's been many years since I have been in Perl and I'm finding myself needing to get back up to speed. I have a a line of code that I do not know what it means.
%form = %$input;
Can someone explain or direct me to where I can find find the answer to this? I am not understanding the % symbol.

The short answer is that line copies the contents of a reference to a hash to a named hash. The programmer is likely uncomfortable with reference syntax. No big whoop.
The key concept that people miss about Perl 5's sigils is that they show how you are treating a variable, not what type it is. $ is single items, # is multiple items. % is hash stuff.
The % signifies a hash variable as a whole. So, %form is the entire hash named "form". But, to get a single element out of it, you use the $ (single element) sigil. When you see the {} after a variable, you know you are dealing with a hash:
%form # entire hash named "form"
$form{foo} # single value for key "foo" in hash form
#form{qw(foo bar)} # multiple value for keys "foo" and "bar" (slice)
The second one is more tricky (it's the stuff we cover in Intermediate Perl. $input is a reference to a hash. All references are scalars (so, the $ sigil). To use it as a hash, you have to deference it. For a simple scalar like that, you can put the hash sigil in front: %$input. Now you can treat that as a hash and use the hash operators (keys, values, delete) on it.
Starting with v5.26, there's also a postfix dereference so you can read your left to right: $input->%*.
%$input # entire hash referenced by $input
$input->%* # entire hash, with new hotness postfix deref
${$input}{foo} # single element access: extra $ in front, braces around ref
$$input{foo} # same thing
$input->{foo} # single element access with arrow and braces
#{$input}{qw(foo bar)} # hash slice, multiple items get `#`
#$input{qw(foo bar)} # same thing
$input->#{qw(foo bar)} # same thing, but with postfix notation
Now there's an even more tricky thing. v5.20 introduces the key-value slice, so the % gets some more work to do. This is a slice that returns the keys along with the values, so it gets the % for hash like things:
%form{qw(key1 key2)} # returns a list of key-value pairs
But, this also works on arrays to get the index and value. You know it's an array because you see the [], but you know it's returning index-value pairs because you see the %:
%array[1,3,7] # returns list like ( 1, ..., 3, ..., 7, ...)

Related

Perl: Why can you use # or $ when accessing a specific element in an array?

I'm a novice Perl user and have not been able to find a satisfactory answer
my #foo = ("foo","bar")
print "#foo[0]"
foo
print "$foo[1]"
bar
Not only #foo[0] works as expected, but $foo[1] outputs a string as well.
Why? This is even when use strict is enabled.
Both #foo[0] and $foo[1] are legal Perl constructions.
$foo[$n], or more generally $foo[EXPR], is a scalar element, representing the $n-th (with the index starting at 0) element of the array #foo (or whatever EXPR evaluates to).
#foo[LIST] is an array slice, the set of elements of #foo indicated by indices in LIST. When LIST has one element, like #foo[0], then #foo[LIST] is a list with one element.
Although #foo[1] is a valid expression, experience has shown that that construction is usually used inappropriately, when it would be more appropriate to say $foo[1]. So a warning -- not an error -- is issued when warnings are enabled and you use that construction.
$foo[LIST] is also a valid expression in Perl. It's just that Perl will evaluate the LIST in scalar context and return the element of #foo corresponding to that evaluation, not a list of elements.
#foo = ("foo","bar","baz");
$foo[0] returns the scalar "foo"
#foo[1] returns the list ("bar") (and issues warning)
#foo[0,2] returns the list ("foo","baz")
$foo[0,2] returns "baz" ($foo[0,2] evaluates to $foo[2], and issues warning)
$foo[#foo] returns undef (evaluates to $foo[scalar #foo] => $foo[3])
Diversion: the only reason I could come up with to use #foo[SCALAR] is as an lvalue somewhere that distinguishes between scalar/list context. Consider
sub bar { wantarray ? (42,19) : 101 }
$foo[0] = bar(); would assign the value 101 to the 1st element of #foo, but #foo[0] = bar() would assign the value 42. But even that's not anything you couldn't accomplish by saying ($foo[0]) = bar(), unless you're golfing.
Perl sigils tell you how you are treating data and only loosely relate to variable type. Instead of thinking about what something is, think about what it is doing. It's verbs over nouns in Perl:
$ uses a single item
# uses multiple items
% uses pairs
The scalar variable $foo is a single item and uses the $ sigil. Since there aren't multiples or pairs for a single item, the other sigils don't come into play.
The array variable #foo is potentially many items and the # refers to all of those items together. The $foo[INDEX] refers to a single item in the array and uses the $ sigil. You can refer to multiple items with an array slice, such as #foo[INDEX,INDEX2,...] and you use the # for that. Your question focuses on the degenerate case of a slice of one element, #foo[INDEX]. That works, but it's in list context. Sometimes that behaves differently.
The hash variable %foo is a collection of key-value pairs. To get a single value, you use the $ again, like $foo{KEY}. You can also get more than one value with a hash slice, using the # because it's multiple values, like #hash{KEY1,KEY2,...}.
And, here's a recent development: Perl 5.20 introduces “Key/Value Slices”. You can get a hash slice of either an array or a hash. %array[INDEX1,INDEX2] or %hash{KEY1,KEY2}. These return a list of key-value pairs. In the array case, the keys are the indices.
For arrays and hashes with single element access or either type of slice, you know the variable type by the indexing character: arrays use [] and hashes use {}. And, here's the other interesting wrinkle: those delimiters supply scalar or list context depending on single or (potentially) multiple items.
Both #foo[0] and $foo[1] are legal Perl constructions.
$foo[EXPR] (where EXPR is an arbitrary expression evaluated in scalar context) returns the single element specified by the result of the expression.
#foo[LIST] (where LIST is an arbitrary expression evaluated in list context) returns every element specified by the result of the expression.
(Contrary to other posts, there's no such thing as $foo[LIST] in Perl. When using $foo[...], the expression is always evaluated in scalar context.)
Although #foo[1] is a valid expression, experience has shown that that construction is usually used inappropriately, when it would be more appropriate to say $foo[1]. So a warning — not an excption — is issued when warnings are enabled and you use that construction.
What this means:
my #foo = ( "foo", "bar", "baz" );
$foo[0] 0 eval'ed in scalar cx. Returns scalar "foo". ok
#foo[1] 1 eval'ed in list cx. Returns scalar "bar". Weird. Warns.
#foo[0,2] 0,2 eval'ed in list cx. Returns scalars "foo" and "baz". ok
$foo[0,2] 0,2 eval'ed in scalar cx. Returns scalar "baz". Wrong. Warns.
$foo[#foo] #foo eval'ed in scalar cx. Returns undef. Probably wrong.
The only reason I could come up with to use #foo[SCALAR] is as an lvalue somewhere that distinguishes between scalar/list context. Consider
sub bar { wantarray ? (42,19) : 101 }
$foo[0] = bar(); would assign the value 101 to the 1st element of #foo, but #foo[0] = bar(); would assign the value 42. It would be far more common to use ($foo[0]) = bar() instead.
Portions of the post Copyrighted by mob under the same terms as this site.
This post addresses numerous issues in mob's post, including 1) the misuse of LIST to mean something other than an arbitrary expression in list context, 2) pretending that parens creates lists, 3) pretending that there's a difference between return scalars and returning a list, and 4) pretending there's no such thing as a non-fatal error.

In Perl, what is the difference between accessing an array element using #a[$i] as opposed to using $a[$i]?

Basic syntax tutorials I followed do not make this clear:
Is there any practical/philosophical/context-dependent/tricky difference between accessing an array using the former or latter subscript notation?
$ perl -le 'my #a = qw(io tu egli); print $a[1], #a[1]'
The output seems to be the same in both cases.
$a[...] # array element
returns the one element identified by the index expression, and
#a[...] # array slice
returns all the elements identified by the index expression.
As such,
You should use $a[EXPR] when you mean to access a single element in order to convey this information to the reader. In fact, you can get a warning if you don't.
You should use #a[LIST] when you mean to access many elements or a variable number of elements.
But that's not the end of the story. You asked for practical and tricky (subtle?) differences, and there's one noone mentioned yet: The index expression for an array element is evaluated in scalar context, while the index expression for an array slice is evaluated in list context.
sub f { return #_; }
$a[ f(4,5,6) ] # Same as $a[3]
#a[ f(4,5,6) ] # Same as $a[4],$a[5],$a[6]
If you turn on warnings (which you always should) you would see this:
Scalar value #a[0] better written as $a[0]
when you use #a[1].
The # sigil means "give me a list of something." When used with an array subscript, it retrieves a slice of the array. For example, #foo[0..3] retrieves the first four items in the array #foo.
When you write #a[1], you're asking for a one-element slice from #a. That's perfectly OK, but it's much clearer to ask for a single value, $a[1], instead. So much so that Perl will warn you if you do it the first way.
The first yields a scalar variable while the second gives you an array slice .... Very different animals!!

Chained reference in Perl

How to understand the following two lines of Perl codes:
%{$self->{in1}->{sv1}} = %{$cs->{out}->{grade}};
and
#{$self->{in1}->{sv1value}} = #{$cs->{out}->{forcast}};
Both of them involve using hashes and hash reference in a chain manner, except the first one uses % and the second one is an array object using #. What are the resulting differences here, about which I am not very clear.
In the first one $self->{in1}->{sv1} and $cs->{out}->{grade} are both references to hashes. So the line:
%{$self->{in1}->{sv1}} = %{$cs->{out}->{grade}};
Is replacing the contents of the has refrenced by $self->{in1}->{sv1} with the contents of the hash referenced by $cs->{out}->{grade}.
NOTE: This is very different to:
$self->{in1}->{sv1} = $cs->{out}->{grade}
Which just makes them reference the same hash.
The second line is doing the same thing except it is arrays which are referenced, not hashes.
You answered your own question. The first line copies a hash to a hash and the second line copies an array to an array!! In other words $self->{in1}->{sv1} is a reference to a hash and $self->{in1}->{sv1value} is a reference to an array.

What do these lines in `dna2protein.pl` do?

I'm a newbie to perl and I found a script to convert a DNA sequence to protein sequence using Perl. I don't understand what some lines in that script do, specially the following:
my(%g)=('TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S','TTC'=>'F','TTT'=>'F','TTA'=>'L','TTG'=>'L','TAC'=>'Y','TAT'=>'Y','TAA'=>'_','TAG'=>'_','TGC'=>'C','TGT'=>'C','TGA'=>'_','TGG'=>'W','CTA'=>'L','CTC'=>'L','CTG'=>'L','CTT'=>'L','CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P','CAC'=>'H','CAT'=>'H','CAA'=>'Q','CAG'=>'Q','CGA'=>'R','CGC'=>'R','CGG'=>'R','CGT'=>'R','ATA'=>'I','ATC'=>'I','ATT'=>'I','ATG'=>'M','ACA'=>'T','ACC'=>'T','ACG'=>'T','ACT'=>'T','AAC'=>'N','AAT'=>'N','AAA'=>'K','AAG'=>'K','AGC'=>'S','AGT'=>'S','AGA'=>'R','AGG'=>'R','GTA'=>'V','GTC'=>'V','GTG'=>'V','GTT'=>'V','GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A','GAC'=>'D','GAT'=>'D','GAA'=>'E','GAG'=>'E','GGA'=>'G','GGC'=>'G','GGG'=>'G','GGT'=>'G');
if(exists $g{$codon})
{
return $g{$codon};
}
else
{
print STDERR "Bad codon \"$codon\"!!\n";
exit;
}
Can someone please explain?
My perl is rusty but anyway.
The first line creates a hash (which is perls version of a hash table). The variable is called g (a bad name BTW). The % sigil before g is used to indicate that it is a hash. Perl uses sigils to denote types. The hash is initialises using the double barrelled arrow syntax. 'TTT'=>'F' creates an entry TTT in the hash table with value F. The my is used to give the variable a local scope.
The next few lines are fairly self explanatory. It will check whether the hash contains an entry with key $codon. The $ sigil is used to indicate that it's a scalar value. If if exists, you get the value. Otherwise, it prints the message specified to the standard error.
Since you're new to Perl, you should read a little about Perl itself before you try to decrypt it's syntax on your own. (Perl values a good Huffman encoding, and is also somewhat encrypted. ;-)Start with the 'perldoc perlintro' command, and go from there. If you're using Ubunutu, for instance, this documentation can be installed via
$ sudo apt-get install perl-doc
but it is also available in this file: Perl Reference documentation
In addition to perlintro, some other suggested reading is perlsyn (syntax description), perldata (data structures), perlop (operators, including quotes), perlreftut (intro to references), and perlvar (predefined variables and their meanings), in roughly that order.
I learnt perl from these, and I still refer to them often.
Also, if your DNA script has POD documentation, then you can view that neatly by typing
$ perldoc <script-filename>
(of course, POD documentation is listed in the source, in a rougher form; read perlpod for more details on documentation fromat)
If you are new to Perl with an interest to understand more quickly, you might begin with this web collection learn.perl. A nice supplement is the online Perl documentation of perldoc. Good luck and have fun.
In this case it looks like the %g hash serves as both a way to identify whether a codon is within the set of valid condons (hash keys) and for some mapping to what type of codon it is (hash value).
Hashes serve as a way to link unique keys with a value, but they also serve as unique lists of keys. In some cases you may see keys added to a hash and set to undef. This is a good sign that the hash is being used to track unique values of some type.
The codon is being passed in to the function, upper cased and then a hash of codons is checked to see if there is codon of that value registered. If the codon exists the registered value for that codon is returned, otherwise an error is outputed and the program ends.
the my (%g) is creating a hash, which is a structure that allows you to quickly look up a value by giving a key for that value. So for instance 'TCA'=>'S' maps the value 'S' to 'TCA'. If you ask the g hash for the value held for 'TCA' you will get 'S' ($g{'TCA'} //will equal 'S' )

Perl: basic question about hashmap

$hash_map{$key}->{$value1} = 1;
I'm just a beginner at perl and I need help in this expression, what does this expression mean? I assume that a new key/value pair will be created but what is the meaning of 1 here?
What you've got here is a hash of hashes, or a two-level hash. $hash_map{$key} holds a hash reference, which points to another hash. $hash_map{$key}{$value} (the arrow can be omitted in this case) is a particular key in the second hash. The 1 is the value being assigned to that hash key.
For more on this topic, see Perl Data Structures Cookbook section on Hashes of Hashes, and also see the Perl reference tutorial for how references work.