What do these lines in `dna2protein.pl` do? - perl

I'm a newbie to perl and I found a script to convert a DNA sequence to protein sequence using Perl. I don't understand what some lines in that script do, specially the following:
my(%g)=('TCA'=>'S','TCC'=>'S','TCG'=>'S','TCT'=>'S','TTC'=>'F','TTT'=>'F','TTA'=>'L','TTG'=>'L','TAC'=>'Y','TAT'=>'Y','TAA'=>'_','TAG'=>'_','TGC'=>'C','TGT'=>'C','TGA'=>'_','TGG'=>'W','CTA'=>'L','CTC'=>'L','CTG'=>'L','CTT'=>'L','CCA'=>'P','CCC'=>'P','CCG'=>'P','CCT'=>'P','CAC'=>'H','CAT'=>'H','CAA'=>'Q','CAG'=>'Q','CGA'=>'R','CGC'=>'R','CGG'=>'R','CGT'=>'R','ATA'=>'I','ATC'=>'I','ATT'=>'I','ATG'=>'M','ACA'=>'T','ACC'=>'T','ACG'=>'T','ACT'=>'T','AAC'=>'N','AAT'=>'N','AAA'=>'K','AAG'=>'K','AGC'=>'S','AGT'=>'S','AGA'=>'R','AGG'=>'R','GTA'=>'V','GTC'=>'V','GTG'=>'V','GTT'=>'V','GCA'=>'A','GCC'=>'A','GCG'=>'A','GCT'=>'A','GAC'=>'D','GAT'=>'D','GAA'=>'E','GAG'=>'E','GGA'=>'G','GGC'=>'G','GGG'=>'G','GGT'=>'G');
if(exists $g{$codon})
{
return $g{$codon};
}
else
{
print STDERR "Bad codon \"$codon\"!!\n";
exit;
}
Can someone please explain?

My perl is rusty but anyway.
The first line creates a hash (which is perls version of a hash table). The variable is called g (a bad name BTW). The % sigil before g is used to indicate that it is a hash. Perl uses sigils to denote types. The hash is initialises using the double barrelled arrow syntax. 'TTT'=>'F' creates an entry TTT in the hash table with value F. The my is used to give the variable a local scope.
The next few lines are fairly self explanatory. It will check whether the hash contains an entry with key $codon. The $ sigil is used to indicate that it's a scalar value. If if exists, you get the value. Otherwise, it prints the message specified to the standard error.

Since you're new to Perl, you should read a little about Perl itself before you try to decrypt it's syntax on your own. (Perl values a good Huffman encoding, and is also somewhat encrypted. ;-)Start with the 'perldoc perlintro' command, and go from there. If you're using Ubunutu, for instance, this documentation can be installed via
$ sudo apt-get install perl-doc
but it is also available in this file: Perl Reference documentation
In addition to perlintro, some other suggested reading is perlsyn (syntax description), perldata (data structures), perlop (operators, including quotes), perlreftut (intro to references), and perlvar (predefined variables and their meanings), in roughly that order.
I learnt perl from these, and I still refer to them often.
Also, if your DNA script has POD documentation, then you can view that neatly by typing
$ perldoc <script-filename>
(of course, POD documentation is listed in the source, in a rougher form; read perlpod for more details on documentation fromat)

If you are new to Perl with an interest to understand more quickly, you might begin with this web collection learn.perl. A nice supplement is the online Perl documentation of perldoc. Good luck and have fun.

In this case it looks like the %g hash serves as both a way to identify whether a codon is within the set of valid condons (hash keys) and for some mapping to what type of codon it is (hash value).
Hashes serve as a way to link unique keys with a value, but they also serve as unique lists of keys. In some cases you may see keys added to a hash and set to undef. This is a good sign that the hash is being used to track unique values of some type.

The codon is being passed in to the function, upper cased and then a hash of codons is checked to see if there is codon of that value registered. If the codon exists the registered value for that codon is returned, otherwise an error is outputed and the program ends.
the my (%g) is creating a hash, which is a structure that allows you to quickly look up a value by giving a key for that value. So for instance 'TCA'=>'S' maps the value 'S' to 'TCA'. If you ask the g hash for the value held for 'TCA' you will get 'S' ($g{'TCA'} //will equal 'S' )

Related

Perl variables defined with * vs $

What's the difference between defining a variable with a * vs a $? For example:
local $var;
local *var;
The initial character is known as a sigil, and says what sort of value the identifier represents. You will know most of them. Here's a list
Dollar $ is a scalar value
At sign # is an array value
Percent % is a hash value
Ampersand & is a code value
Asterisk * is a typeglob
You are less likely to have come across the last two recently, because & hasn't been necessary when calling subroutines since Perl 5.0 was released. And typeglobs are a special type that contains all of the other types, and are much more rarely used.
I'm considering how much deeper to go into all of this, but will leave my answer as it is for now. I may write more depending on the comments that arise.
$var is a scalar. *var is a typeglob. http://perldoc.perl.org/perldata.html#Typeglobs-and-Filehandles
It's not a variable in the strictest sense. You shouldn't generally be using it.

How does this Perl one-liner actually work?

So, I happened to notice that last.fm is hiring in my area, and since I've known a few people who worked there, I though of applying.
But I thought I'd better take a look at the current staff first.
Everyone on that page has a cute/clever/dumb strapline, like "Is life not a thousand times too short for us to bore ourselves?". In fact, it was quite amusing, until I got to this:
perl -e'print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34'
Which I couldn't resist pasting into my terminal (kind of a stupid thing to do, maybe), but it printed:
Just another Last.fm hacker,
I thought it would be relatively easy to figure out how that Perl one-liner works. But I couldn't really make sense of the documentation, and I don't know Perl, so I wasn't even sure I was reading the relevant documentation.
So I tried modifying the numbers, which got me nowhere. So I decided it was genuinely interesting and worth figuring out.
So, 'how does it work' being a bit vague, my question is mainly,
What are those numbers? Why are there negative numbers and positive numbers, and does the negativity or positivity matter?
What does the combination of operators +=$_ do?
What's pack+q,c*,, doing?
This is a variant on “Just another Perl hacker”, a Perl meme. As JAPHs go, this one is relatively tame.
The first thing you need to do is figure out how to parse the perl program. It lacks parentheses around function calls and uses the + and quote-like operators in interesting ways. The original program is this:
print+pack+q,c*,,map$.+=$_,74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34
pack is a function, whereas print and map are list operators. Either way, a function or non-nullary operator name immediately followed by a plus sign can't be using + as a binary operator, so both + signs at the beginning are unary operators. This oddity is described in the manual.
If we add parentheses, use the block syntax for map, and add a bit of whitespace, we get:
print(+pack(+q,c*,,
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
The next tricky bit is that q here is the q quote-like operator. It's more commonly written with single quotes:
print(+pack(+'c*',
map{$.+=$_} (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21,
18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34)))
Remember that the unary plus is a no-op (apart from forcing a scalar context), so things should now be looking more familiar. This is a call to the pack function, with a format of c*, meaning “any number of characters, specified by their number in the current character set”. An alternate way to write this is
print(join("", map {chr($.+=$_)} (74, …, -34)))
The map function applies the supplied block to the elements of the argument list in order. For each element, $_ is set to the element value, and the result of the map call is the list of values returned by executing the block on the successive elements. A longer way to write this program would be
#list_accumulator = ();
for $n in (74, …, -34) {
$. += $n;
push #list_accumulator, chr($.)
}
print(join("", #list_accumulator))
The $. variable contains a running total of the numbers. The numbers are chosen so that the running total is the ASCII codes of the characters the author wants to print: 74=J, 74+43=117=u, 74+43-2=115=s, etc. They are negative or positive depending on whether each character is before or after the previous one in ASCII order.
For your next task, explain this JAPH (produced by EyesDrop).
''=~('(?{'.('-)#.)#_*([]#!#/)(#)#-#),#(##+#)'
^'][)#]`}`]()`#.#]#%[`}%[#`#!##%[').',"})')
Don't use any of this in production code.
The basic idea behind this is quite simple. You have an array containing the ASCII values of the characters. To make things a little bit more complicated you don't use absolute values, but relative ones except for the first one. So the idea is to add the specific value to the previous one, for example:
74 -> J
74 + 43 -> u
74 + 42 + (-2 ) -> s
Even though $. is a special variable in Perl it does not mean anything special in this case. It is just used to save the previous value and add the current element:
map($.+=$_, ARRAY)
Basically it means add the current list element ($_) to the variable $.. This will return a new array with the correct ASCII values for the new sentence.
The q function in Perl is used for single quoted, literal strings. E.g. you can use something like
q/Literal $1 String/
q!Another literal String!
q,Third literal string,
This means that pack+q,c*,, is basically pack 'c*', ARRAY. The c* modifier in pack interprets the value as characters. For example, it will use the value and interpret it as a character.
It basically boils down to this:
#!/usr/bin/perl
use strict;
use warnings;
my $prev_value = 0;
my #relative = (74,43,-2,1,-84, 65,13,1,5,-12,-3, 13,-82,44,21, 18,1,-70,56, 7,-77,72,-7,2, 8,-6,13,-70,-34);
my #absolute = map($prev_value += $_, #relative);
print pack("c*", #absolute);

How to get certain parts of a perl hash

I'm learning perl and using Weather::NOAA::Alert and am wanting to figure out how to capture only a certain part of its output.
It outputs a hash, but I only want a certain part, for example the urgency part... what should I do?
Sample output
It's not array is a hash of hashes of hashes. You can access values by
$result->{'US'}->{'http://alerts.weather.gov/cap/wwacapget.php?x=MT124CAB8F109C.WinterWeatherAdvisory.124CAB90FBA0MT.TFXWSWTFX.c906fc319cc9f5b747e95ac455f8c2f0'}->{'certainty'}
will contain the string
Likely
Check http://www.cs.mcgill.ca/~abatko/computers/programming/perl/howto/hash/ for an introduction on Perl hashes.

Why does the Perl CGI module use hyphens to start named arguments?

I am a novice. My question is what is the "-" before the keys (type, expires name etc) standing for? Why not just use the plain hash table way and discard the hyphen?
# #!/usr/local/bin/perl -w
use CGI;
$q = CGI->new;
print $q->header(-type=>'image/gif',-expires=>'+3d');
$q->param(-name=>'veggie',-value=>'tomato');
The author already explained in the documentation.
Most CGI.pm routines accept several
arguments, sometimes as many as 20
optional ones! To simplify this
interface, all routines use a named
argument calling style that looks like
this:
print
$q->header(-type=>'image/gif',-expires=>'+3d');
Each argument name is preceded by a
dash. Neither case nor order matters
in the argument list. -type, -Type,
and -TYPE are all acceptable. In
fact, only the first argument needs to
begin with a dash. If a dash is
present in the first argument, CGI.pm
assumes dashes for the subsequent
ones.
Several routines are commonly called
with just one argument. In the case
of these routines you can provide the
single argument without an argument
name. header() happens to be one of
these routines. In this case, the
single argument is the document type.
print $q->header('text/html');
See perlop:
If the operand is an identifier, a string consisting of a minus sign concatenated with the identifier is returned. Otherwise, if the string starts with a plus or minus, a string starting with the opposite sign is returned. One effect of these rules is that -bareword is equivalent to the string "-bareword". (emphasis mine)
This is just an older style of perl arguments that isn't usually used in newer modules. It's not exactly deprecated, it's just an older style based on how Perl allows you to not quote your hash keys if they start with a dash.
I don't know what you mean by the 'plain hashtable way'. The way CGI::pm is implemented, names of properties are (in most cases) required to be preceded by '-', presumably so that they can be identified.
Or to put it another way, the hash-key required by CGI::header to identify the 'type' property is '-type'.
That's just the way CGI.pm is defined.

Perl: basic question about hashmap

$hash_map{$key}->{$value1} = 1;
I'm just a beginner at perl and I need help in this expression, what does this expression mean? I assume that a new key/value pair will be created but what is the meaning of 1 here?
What you've got here is a hash of hashes, or a two-level hash. $hash_map{$key} holds a hash reference, which points to another hash. $hash_map{$key}{$value} (the arrow can be omitted in this case) is a particular key in the second hash. The 1 is the value being assigned to that hash key.
For more on this topic, see Perl Data Structures Cookbook section on Hashes of Hashes, and also see the Perl reference tutorial for how references work.