Difference between %something and %{$something}? - perl

Two (almost?) similar questions actually.
What is the difference between %something and %{$something} ?
What is the difference between %{$hashvar{xyz}} and %hashvar{xyz} ?

%something means something is a hash variable, and %{$something} means that $something is a scalar variable that contains a reference to a hash
%{$hashvar{xyz}} means $hashvar{xyz} (value associated with key xyz in hash hashvar) is a hash reference
Starting in Perl 5.20, %hashvar{xyz} is a key/value hash slice, will return 'xyz' and $hashvar{xyz}; before that, it is a syntax error.
References:
perldata - Perl data types
perldsc - Perl Data Structures Cookbook

Related

What does the % mean in Perl?

It's been many years since I have been in Perl and I'm finding myself needing to get back up to speed. I have a a line of code that I do not know what it means.
%form = %$input;
Can someone explain or direct me to where I can find find the answer to this? I am not understanding the % symbol.
The short answer is that line copies the contents of a reference to a hash to a named hash. The programmer is likely uncomfortable with reference syntax. No big whoop.
The key concept that people miss about Perl 5's sigils is that they show how you are treating a variable, not what type it is. $ is single items, # is multiple items. % is hash stuff.
The % signifies a hash variable as a whole. So, %form is the entire hash named "form". But, to get a single element out of it, you use the $ (single element) sigil. When you see the {} after a variable, you know you are dealing with a hash:
%form # entire hash named "form"
$form{foo} # single value for key "foo" in hash form
#form{qw(foo bar)} # multiple value for keys "foo" and "bar" (slice)
The second one is more tricky (it's the stuff we cover in Intermediate Perl. $input is a reference to a hash. All references are scalars (so, the $ sigil). To use it as a hash, you have to deference it. For a simple scalar like that, you can put the hash sigil in front: %$input. Now you can treat that as a hash and use the hash operators (keys, values, delete) on it.
Starting with v5.26, there's also a postfix dereference so you can read your left to right: $input->%*.
%$input # entire hash referenced by $input
$input->%* # entire hash, with new hotness postfix deref
${$input}{foo} # single element access: extra $ in front, braces around ref
$$input{foo} # same thing
$input->{foo} # single element access with arrow and braces
#{$input}{qw(foo bar)} # hash slice, multiple items get `#`
#$input{qw(foo bar)} # same thing
$input->#{qw(foo bar)} # same thing, but with postfix notation
Now there's an even more tricky thing. v5.20 introduces the key-value slice, so the % gets some more work to do. This is a slice that returns the keys along with the values, so it gets the % for hash like things:
%form{qw(key1 key2)} # returns a list of key-value pairs
But, this also works on arrays to get the index and value. You know it's an array because you see the [], but you know it's returning index-value pairs because you see the %:
%array[1,3,7] # returns list like ( 1, ..., 3, ..., 7, ...)

In Perl, what is the difference between accessing an array element using #a[$i] as opposed to using $a[$i]?

Basic syntax tutorials I followed do not make this clear:
Is there any practical/philosophical/context-dependent/tricky difference between accessing an array using the former or latter subscript notation?
$ perl -le 'my #a = qw(io tu egli); print $a[1], #a[1]'
The output seems to be the same in both cases.
$a[...] # array element
returns the one element identified by the index expression, and
#a[...] # array slice
returns all the elements identified by the index expression.
As such,
You should use $a[EXPR] when you mean to access a single element in order to convey this information to the reader. In fact, you can get a warning if you don't.
You should use #a[LIST] when you mean to access many elements or a variable number of elements.
But that's not the end of the story. You asked for practical and tricky (subtle?) differences, and there's one noone mentioned yet: The index expression for an array element is evaluated in scalar context, while the index expression for an array slice is evaluated in list context.
sub f { return #_; }
$a[ f(4,5,6) ] # Same as $a[3]
#a[ f(4,5,6) ] # Same as $a[4],$a[5],$a[6]
If you turn on warnings (which you always should) you would see this:
Scalar value #a[0] better written as $a[0]
when you use #a[1].
The # sigil means "give me a list of something." When used with an array subscript, it retrieves a slice of the array. For example, #foo[0..3] retrieves the first four items in the array #foo.
When you write #a[1], you're asking for a one-element slice from #a. That's perfectly OK, but it's much clearer to ask for a single value, $a[1], instead. So much so that Perl will warn you if you do it the first way.
The first yields a scalar variable while the second gives you an array slice .... Very different animals!!

What do these lines in `dna2protein.pl` do?

I'm a newbie to perl and I found a script to convert a DNA sequence to protein sequence using Perl. I don't understand what some lines in that script do, specially the following:
my(%g)=('TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S','TTC'=>'F','TTT'=>'F','TTA'=>'L','TTG'=>'L','TAC'=>'Y','TAT'=>'Y','TAA'=>'_','TAG'=>'_','TGC'=>'C','TGT'=>'C','TGA'=>'_','TGG'=>'W','CTA'=>'L','CTC'=>'L','CTG'=>'L','CTT'=>'L','CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P','CAC'=>'H','CAT'=>'H','CAA'=>'Q','CAG'=>'Q','CGA'=>'R','CGC'=>'R','CGG'=>'R','CGT'=>'R','ATA'=>'I','ATC'=>'I','ATT'=>'I','ATG'=>'M','ACA'=>'T','ACC'=>'T','ACG'=>'T','ACT'=>'T','AAC'=>'N','AAT'=>'N','AAA'=>'K','AAG'=>'K','AGC'=>'S','AGT'=>'S','AGA'=>'R','AGG'=>'R','GTA'=>'V','GTC'=>'V','GTG'=>'V','GTT'=>'V','GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A','GAC'=>'D','GAT'=>'D','GAA'=>'E','GAG'=>'E','GGA'=>'G','GGC'=>'G','GGG'=>'G','GGT'=>'G');
if(exists $g{$codon})
{
return $g{$codon};
}
else
{
print STDERR "Bad codon \"$codon\"!!\n";
exit;
}
Can someone please explain?
My perl is rusty but anyway.
The first line creates a hash (which is perls version of a hash table). The variable is called g (a bad name BTW). The % sigil before g is used to indicate that it is a hash. Perl uses sigils to denote types. The hash is initialises using the double barrelled arrow syntax. 'TTT'=>'F' creates an entry TTT in the hash table with value F. The my is used to give the variable a local scope.
The next few lines are fairly self explanatory. It will check whether the hash contains an entry with key $codon. The $ sigil is used to indicate that it's a scalar value. If if exists, you get the value. Otherwise, it prints the message specified to the standard error.
Since you're new to Perl, you should read a little about Perl itself before you try to decrypt it's syntax on your own. (Perl values a good Huffman encoding, and is also somewhat encrypted. ;-)Start with the 'perldoc perlintro' command, and go from there. If you're using Ubunutu, for instance, this documentation can be installed via
$ sudo apt-get install perl-doc
but it is also available in this file: Perl Reference documentation
In addition to perlintro, some other suggested reading is perlsyn (syntax description), perldata (data structures), perlop (operators, including quotes), perlreftut (intro to references), and perlvar (predefined variables and their meanings), in roughly that order.
I learnt perl from these, and I still refer to them often.
Also, if your DNA script has POD documentation, then you can view that neatly by typing
$ perldoc <script-filename>
(of course, POD documentation is listed in the source, in a rougher form; read perlpod for more details on documentation fromat)
If you are new to Perl with an interest to understand more quickly, you might begin with this web collection learn.perl. A nice supplement is the online Perl documentation of perldoc. Good luck and have fun.
In this case it looks like the %g hash serves as both a way to identify whether a codon is within the set of valid condons (hash keys) and for some mapping to what type of codon it is (hash value).
Hashes serve as a way to link unique keys with a value, but they also serve as unique lists of keys. In some cases you may see keys added to a hash and set to undef. This is a good sign that the hash is being used to track unique values of some type.
The codon is being passed in to the function, upper cased and then a hash of codons is checked to see if there is codon of that value registered. If the codon exists the registered value for that codon is returned, otherwise an error is outputed and the program ends.
the my (%g) is creating a hash, which is a structure that allows you to quickly look up a value by giving a key for that value. So for instance 'TCA'=>'S' maps the value 'S' to 'TCA'. If you ask the g hash for the value held for 'TCA' you will get 'S' ($g{'TCA'} //will equal 'S' )

What does the special variable $#_ mean in Perl?

I encountered this special variable ($#_) while browsing. Tried finding out what it means, but couldn't find any. Please let me know what this special variable mean.
In Perl, you get the index of the last element of #array with the syntax $#array. So $#_ is the index of the last element in the array #_. This is not the same thing as the number of elements in the array (which you get with scalar #array), because Perl arrays are normally 0-based.

Perl: basic question about hashmap

$hash_map{$key}->{$value1} = 1;
I'm just a beginner at perl and I need help in this expression, what does this expression mean? I assume that a new key/value pair will be created but what is the meaning of 1 here?
What you've got here is a hash of hashes, or a two-level hash. $hash_map{$key} holds a hash reference, which points to another hash. $hash_map{$key}{$value} (the arrow can be omitted in this case) is a particular key in the second hash. The 1 is the value being assigned to that hash key.
For more on this topic, see Perl Data Structures Cookbook section on Hashes of Hashes, and also see the Perl reference tutorial for how references work.