I'm trying to implement a subroutine that calculates the d-neighbors of an input string. This is apart of an implementation of planted motif search, but my question is much more general. Here is the code:
#subroutine for generating d-neighbors
sub generate_d_neighbors{
# $sequence is the sequence to generate d-neighbors from
# $HD is the Hamming Distance
my ($sequence, $HD) = #_;
for(my $i = 0; $i=$HD; $i++){
my #l = ['A', 'C', 'T', 'G'];
my #t = splice(#l,$sequence[$i]);
#TODO
}
}
The error is occurring at the last line, saying that:
Global symbol "#sequence" requires explicit package name (did you forget to declare "my #sequence"?
It was my understanding that Perl does not take parameters in the form subroutine(param1, param2) like in Java for example, but why is $sequence not being recognized as already having been initialized?
There are some problems with your code:
sub generate_d_neighbors{
my ($sequence, $HD) = #_;
for(my $i = 0; $i=$HD; $i++){
my #l = ['A', 'C', 'T', 'G'];
my #t = splice(#l,$sequence[$i]);
}
}
First, let's look at
for(my $i = 0; $i=$HD; $i++){
Assuming $HD is nonzero, this loop will never terminate because the condition will never be false. If you wanted $i to range from 0 to $HD, writing the statement as for my $i (0 .. $HD) would have been better.
Second, you have
my #t = splice(#l,$sequence[$i]);
where you seem to assume there is an array #sequence and you are trying to access its first element. However, $sequence is a reference to an array. Therefore, you should use
$sequence->[$i]
Third (thanks #Ikegami), you have
my #l = ['A', 'C', 'T', 'G'];
in the body of the for-loop. Then #l will contain a single element, a reference to an anonymous array containing the elements 'A', 'C', 'T', and 'G'. Instead, use:
my #l = qw(A C T G);
I am not sure exactly what you want to achieve with splice(#l, $sequence->[$i]), but that can be better written as:
my #t = #l[0 .. ($sequence->[$i] - 1)];
In fact, you could reduce the two assignments to:
my #t = qw(A C T G)[0 .. ($sequence->[$i] - 1)];
It looks to me like you want
substring($sequence, 0, 1)
instead of
$sequence[0].
In Perl, strings are first class variables, not a type of array.
Or maybe you want splice(#l, $sequence->[0])?
This list-assignment syntax:
my (#sequence, $HD) = #_;
doesn't do what you might want it to do (put the last argument in $HD and the rest in #sequence). The array always takes all the arguments it can, leaving none for whatever comes after it.
Reversing the order can work, for cases where there is only one array:
my ($HD, #sequence) = #_;
and you make the corresponding change in the caller.
To solve the problem more generally, use a reference:
my ($sequence, $HD) = #_;
and call the sub like this:
generate_d_neighbors(\#foo, $bar);
or this:
# Note the brackets, which make an array reference, unlike parentheses
# which would result in a flat list.
generate_d_neighbors([...], 42);
If you use a prototype:
sub generate_d_neighbors (\#$)
then the caller can say
generate_d_neighbors(#foo, $bar);
and the #foo automatically becomes a reference as if it had been\#foo.
If you use any of the reference-based solutions, you must alter the body of the function to use $sequence instead of #sequence, following these rules:
Change #sequence to #$sequence
Change $#sequence to $#$sequence
Change $sequence[...] to $sequence->[...]
Change #sequence[...] to #$sequence[...] (but be sure you really meant to use an array slice... if you're new to perl you probably didn't mean it, and should have used $sequence[...] instead)
Related
When I assign the Perl #ARGV array to a variable, if I don't use the quotes, it gives me the number of strings in the array, and not the strings in the array.
What is this called - I thought it was dereferencing, but it is not. Right now I am calling it one more thing I need to memorize in Perl.
#!/usr/bin/perl
use strict ;
use warnings;
my $str = "#ARGV" ;
#my $str = #ARGV ;
#my $str = 'geeks, for, geeks';
my #spl = split(', ' , $str);
foreach my $i (#spl) {
print "$i\n" ;
}
If you assign an array to a scalar in Perl, you get the number of elements in the array.
my #array = (1, 1, 2, 3, 5, 8, 13);
my $scalar = #array; # $scalar contains 7
This is known as "evaluating an array in scalar context".
If you expand an array in a double-quoted string in Perl, you get the elements of the array separated by spaces.
my #array = (1, 1, 2, 3, 5, 8, 13);
my $scalar = "#array"; $scalar contains '1 1 2 3 5 8 13'
This is known as "interpolating an array in a double-quoted string".
Actually, in that second example, the elements are separated by the current contents of the $" variable. And the default value of that variable is a space. But you can change it.
my #array = (1, 1, 2, 3, 5, 8, 13);
$" = '+';
my $scalar = "#array"; $scalar contains '1+1+2+3+5+8+13'
To store a reference to the array, you either take a reference to the array.
my $scalar = \#array;
Or create a new, anonymous array using the elements of of the original array.
my $scalar = [ #array ];
Because we don't know what you are actually trying to do, we can't recommend which of these is the best approach.
Perl works by context. The one that you see here is scalar versus array context. In scalar context, you want one thing, so Perl gives you the one thing the probably makes sense. Recognize the context and you can probably suss out what's going on.
When you have a scalar on the left side of an assignment, you have scalar context because you want to end up with one thing:
my $one_thing = ...
Put an array on the right side, and you have an array in scalar context. The design of Perl decided that the most common thing people probably want in that case is the number of elements:
my $one_thing = #array;
This works with some other builtins too. The localtime builtin returns a single string in scalar context (a timestamp):
my $uid = localtime; # Tue Mar 17 11:39:47 2020
But, in array context, you want possibly multiple things (where that could be two, or one, or zero, or ten thousand, or...). In that case, localtime returns a list of things:
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
localtime();
You already know some of this though, probably. The + operator uses its operands as numbers, but the . operator uses them as strings:
my $sum = '123' + 14;
my $string = '123' . 14;
Perl's philosophy is that it is going to try to do what the verbs (operators, builtins, functions) are trying to do, not what the nouns (variable or value type) might imply. Many languages tell the verbs what to do based on the nouns, so fitting Perl into one of those mental modules usually doesn't work out. You don't have to memorize a lot; I've been doing this quite awhile and I still refer to the docs often.
We go through a lot of this philosophical explanation in Learning Perl.
The idiom you are looking for is one of
my $str = \#ARGV;
my $str = [ #ARGV ];
These both assign an array reference to the scalar variable $str. You can then get back the elements of #ARGV when you dereference $str. For example,
for my $i (#$str) {
print "$i\n";
}
(Some people prefer #{$str}, which does the same thing)
\ is the reference operator, which returns a reference to whatever is on its right hand side.
[...] creates a new array reference out of whatever is contained between the brackets.
"#array" is a stringify operation on an array, and equivalent to join($", #array)
And finally, a scalar assignment from an array, like
$n = #array
returns the number of elements in the array.
The total number of elements in the Array can sometimes be required. Since such situations are often encountered, we also have to learn how to get the total number.
#array = ("a".."z");
$re = $#array;
print ("$re\n");
We may need to add one to the number we get to reach the total number.
$ay = ("a", "b", "c");
$re = $#ay;
$re = $re +1;
print ("$re\n");
result: 3
I was creating a multi-dimensional array this way:
#!/usr/bin/perl
use warnings;
use strict;
my #a1 = (1, 2);
my #a2 = (#a1, 3);
But it turns out that I still got a one-dimensional array...
What's the right way in Perl?
You get a one-dimensional array because the array #a1 is expanded inside the parentheses. So, assuming:
my #a1 = (1, 2);
my #a2 = (#a1, 3);
Then your second statement is equivalent to my #a2 = (1,2,3);.
When creating a multi-dimensional array, you have a few choices:
Direct assignment of each value
Dereferencing an existing array
Inserting a reference
The first option is basically $array[0][0] = 1; and is not very exciting.
The second is doing this: my #a2 = (\#a1, 3);. Note that this makes a reference to the namespace for the array #a1, so if you later change #a1, the values inside #a2 will also change. It is not always a recommended option.
A variation of the second option is doing this: my #a2 = ([1,2], 3);. The brackets will create an anonymous array, which has no namespace, only a memory address, and will only exist inside #a2.
The third option, a bit more obscure, is doing this: my $a1 = [1,2]; my #a2 = ($a1, 3);. It will do exactly the same thing as 2, only the array reference is already in a scalar variable, called $a1.
Note the difference between () and [] when assigning to arrays. Brackets [] create an anonymous array, which returns an array reference as a scalar value (for example, that can be held by $a1, or $a2[0]).
Parentheses, on the other hand, do nothing at all really, except change the precedence of operators.
Consider this piece of code:
my #a2 = 1, 2, 3;
print "#a2";
This will print 1. If you use warnings, you will also get a warning such as: Useless use of a constant in void context. Basically, this happens:
my #a2 = 1;
2, 3;
Because commas (,) have a lower precedence than equal sign =. (See "Operator Precedence and Associativity" in perldoc perlop.)
Parentheses simply negate the default precedence of = and ,, and group 1,2,3 together in a list, which is then passed to #a2.
So, in short, brackets, [], have some magic in them: They create anonymous arrays. Parentheses, (), just change precedence, much like in math.
There is much to read in the documentation. Someone here once showed me a very good link for dereferencing, but I don't recall what it was. In perldoc perlreftut you will find a basic tutorial on references. And in perldoc perldsc you will find documentation on data structures (thanks Oesor for reminding me).
I would propose to work through perlreftut, perldsc and perllol, preferably in the same day and preferably using Data::Dumper to print data structures.
The tutorials complement each other and I think they would take better effect together. Visualizing data structures helped me a lot to believe they actually work (seriously) and to see my mistakes.
Arrays contain scalars, so you need to add a reference.
my #a1 = (1,2);
my #a2 = (\#a1, ,3);
You'll want to read http://perldoc.perl.org/perldsc.html.
The most important thing to understand
about all data structures in
Perl--including multidimensional
arrays--is that even though they might
appear otherwise, Perl #ARRAY s and
%HASH es are all internally
one-dimensional. They can hold only
scalar values (meaning a string,
number, or a reference). They cannot
directly contain other arrays or
hashes, but instead contain references
to other arrays or hashes.
Now, because the top level contains only references, if you try to print out your array in with a simple print() function, you'll get something that doesn't look very nice, like this:
#AoA = ( [2, 3], [4, 5, 7], [0] );
print $AoA[1][2];
7
print #AoA;
ARRAY(0x83c38)ARRAY(0x8b194)ARRAY(0x8b1d0)
That's because Perl doesn't (ever) implicitly dereference your variables. If you want to get at the thing a reference is referring to, then you have to do this yourself using either prefix typing indicators, like ${$blah} , #{$blah} , #{$blah[$i]} , or else postfix pointer arrows, like $a->[3] , $h->{fred} , or even $ob->method()->[3]
Source: perldoc
Now coming to your question. Here's your code:
my #a1 = (1,2);
my #a2 = (#a1,3);
Notice that the arrays contain scalar values. So you have to use reference and you can add a reference by using the \ keyword before an array's name which is to be referenced.
Like this:
my #a2 = (\#a1, ,3);
Inner arrays should be scalar references in the outer one:
my #a2 = (\#a1,3); # first element is a reference to a1
print ${$a2[0]}[1]; # print second element of inner array
This is a simple example of a 2D array as ref:
my $AoA = undef;
for(my $i=0; $i<3; $i++) {
for(my $j=0; $j<3; $j++) {
$AoA->[$i]->[$j] = rand(); # Assign some value
}
}
strftime(), as per cpan.org:
print strftime($template, #lt);
I just can't figure the right Perl code recipe for this one. It keeps reporting an error where I call strftime():
...
use Date::Format;
...
sub parse_date {
if ($_[0]) {
$_[0] =~ /(\d{4})/;
my $y = $1;
$_[0] =~ s/\d{4}//;
$_[0] =~ /(\d\d)\D(\d\d)/;
return [$2,$1,$y];
}
return [7,7,2010];
}
foreach my $groupnode ($groupnodes->get_nodelist) {
my $groupname = $xp->find('name/text()', $groupnode);
my $entrynodes = $xp->find('entry', $groupnode);
for my $entrynode ($entrynodes->get_nodelist) {
...
my $date_added = parse_date($xp->find('date_added/text()', $entrynode));
...
$groups{$groupname}{$entryname} = {...,'date_added'=>$date_added,...};
...
}
}
...
my $imday = $maxmonth <= 12 ? 0 : 1;
...
while (my ($groupname, $entries) = each %groups) {
...
while (my ($entryname, $details) = each %$entries) {
...
my $d = #{$details->{'date_added'}};
$writer->dataElement("creation", strftime($date_template, (0,0,12,#$d[0^$imday],#$d[1^$imday]-1,#$d[2],0,0,0)));
}
...
}
...
If I use () to pass the required array by strftime(), I get:
Type of arg 2 to Date::Format::strftime must be array (not list) at ./blah.pl line 87, near "))"
If I use [] to pass the required array, I get:
Type of arg 2 to Date::Format::strftime must be array (not anonymous list ([])) at ./blah.pl line 87, near "])"
How can I pass an array on the fly to a sub in Perl? This can easily be done with PHP, Python, JS, etc. But I just can't figure it with Perl.
EDIT: I reduced the code to these few lines, and I still got the exact same problem:
#!/usr/bin/perl
use warnings;
use strict;
use Date::Format;
my #d = [7,13,2010];
my $imday = 1;
print strftime( q"%Y-%m-%dT12:00:00", (0,0,12,$d[0^$imday],$d[1^$imday]-1,$d[2],0,0,0));
Where an array is required and you have an ad hoc list, you need to actually create an array. It doesn't need to be a separate variable, you can do just:
strftime(
$date_template,
#{ [0,0,12,$d[0^$imday],$d[1^$imday],$d[2],0,0,0] }
);
I have no clue why Date::Format would subject you to this hideousness and not just expect multiple scalar parameters; seems senseless (and contrary to how other modules implement strftime). Graham Barr usually designs better interfaces than this. Maybe it dates from when prototypes still seemed like a cool idea for general purposes.
To use a list as an anonymous array for, say, string interpolation, you could write
print "#{[1, 2, 3]}\n";
to get
1 2 3
The same technique provides a workaround to Date::Format::strftime's funky prototype:
print strftime(q"%Y-%m-%dT12:00:00",
#{[0,0,12,$d[0^$imday],$d[1^$imday]-1,$d[2],0,0,0]});
Output:
1900-24709920-00T12:00:00
Normally, it is easy to pass arrays "on-the-fly" to Perl subroutines. But Date::Format::strftime is a special case with a special prototype ($\#;$) that doesn't allow "list" arguments or "list assignment" arguments:
strftime($format, (0,0,12,13,7-1,2010-1900)); # not ok
strftime($format, #a=(0,0,12,13,7-1,2010-1900)); # not ok
The workaround is that you must call strftime with an array variable.
my #time = (0,0,12,13,7-1,2010-1900); # note: #array = ( ... ), not [ ... ]
strftime($format, #time);
I looked again and I see the real problem in this code:
my $d = #{$details->{'date_added'}};
$writer->dataElement("creation", strftime($date_template, (0,0,12,#$d[0^$imday],#$d[1^$imday]-1,#$d[2],0,0,0)));
Specifically #{$details->{'date_added'}} is a dereference. But you're assigning it to a scalar variable and you don't need to dereference in the line below it:
my #d = #{$details->{'date_added'}};
$writer->dataElement("creation", strftime($date_template, (0,0,12,$d[0^$imday],$d[1^$imday]-1,$d[2],0,0,0)));
I've created a regular array for your reference #d and just accessed it as a regular array ( $d[ ... ] instead of #$d[ ... ] )
If I pass a hash to a sub:
parse(\%data);
Should I use a variable to $_[0] first or is it okay to keep accessing $_[0] whenever I want to get an element from the hash? clarification:
sub parse
{ $var1 = $_[0]->{'elem1'};
$var2 = $_[0]->{'elem2'};
$var3 = $_[0]->{'elem3'};
$var4 = $_[0]->{'elem4'};
$var5 = $_[0]->{'elem5'};
}
# Versus
sub parse
{ my $hr = $_[0];
$var1 = $hr->{'elem1'};
$var2 = $hr->{'elem2'};
$var3 = $hr->{'elem3'};
$var4 = $hr->{'elem4'};
$var5 = $hr->{'elem5'};
}
Is the second version more correct since it doesn't have to keep accessing the argument array, or does Perl end up interpereting them the same way anyhow?
In this case there is no difference because you are passing reference to hash. But in case of passing scalar there will be difference:
sub rtrim {
## remove tailing spaces from first argument
$_[0] =~ s/\s+$//;
}
rtrim($str); ## value of the variable will be changed
sub rtrim_bugged {
my $str = $_[0]; ## this makes a copy of variable
$str =~ s/\s+$//;
}
rtrim($str); ## value of the variable will stay the same
If you're passing hash reference, then only copy of reference is created. But the hash itself will be the same. So if you care about code readability then I suggest you to create a variable for all your parameters. For example:
sub parse {
## you can easily add new parameters to this function
my ($hr) = #_;
my $var1 = $hr->{'elem1'};
my $var2 = $hr->{'elem2'};
my $var3 = $hr->{'elem3'};
my $var4 = $hr->{'elem4'};
my $var5 = $hr->{'elem5'};
}
Also more descriptive variable names will improve your code too.
For a general discussion of the efficiency of shift vs accessing #_ directly, see:
Is there a difference between Perl's shift versus assignment from #_ for subroutine parameters?
Is 'shift' evil for processing Perl subroutine parameters?
As for your specific code, I'd use shift, but simplify the data extraction with a hash slice:
sub parse
{
my $hr = shift;
my ($var1, $var2, $var3, $var4, $var5) = #{$hr}{qw(elem1 elem2 elem3 elem4 elem5)};
}
I'll assume that this method does something else with these variables that makes it worthwhile to keep them in separate variables (perhaps the hash is read-only, and you need to make some modifications before inserting them into some other data?) -- otherwise why not just leave them in the hashref where they started?
You are micro-optimizing; try to avoid that. Go with whatever is most readable/maintainable. Usually this would be the one where you use a lexical variable, since its name indicates its purpose...but if you use a name like $data or $x this obviously doesn't apply.
In terms of the technical details, for most purposes you can estimate the time taken by counting the number of basic ops perl will use. For your $_[0], an element lookup in a non-lexical array variable takes multiple ops: one to get the glob, one to get the array part of the glob, one or more to get the index (just one for a constant), and one to look up the element. $hr, on the other hand is a single op. To cater to direct users of #_, there's an optimization that reduces the ops for $_[0] to a single combined op (when the index is between 0 and 255 inclusive), but it isn't used in your case because the hash-deref context requires an additional flag on the array element lookup (to support autovivification) and that flag isn't supported by the optimized op.
In summary, using a lexical is going to be both more readable and (if you using it more than once) imperceptibly faster.
My rule is that I try not to use $_[0] in subroutines that are longer than a couple of statements. After that everything gets a user-defined variable.
Why are you copying all of the hash values into variables? Just leave them in the hash where they belong. That's a much better optimization than the one you are thinking about.
Its the same although the second is more clear
Since they work, both are fine, the common practice is to shift off parameters.
sub parse { my $hr = shift; my $var1 = $hr->{'elem1'}; }
I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.
my open(infile, 'dnadata.txt');
my #data = < infile>;
chomp #data;
#print #data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my #matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.
I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(#data1);
my $last2 =pop(#data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "#data1\n";
I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck.
This is the test sequence if anyone is curious.
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF
First off, if you're going to work with sequence data, use BioPerl. Life will be so much easier. However...
Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like #data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.
What I'm not clear on what you're trying to do is:
a) are you generating a consensus
sequence where the 2 sequences are
difference only by gaps
b) are your 2 sequences significantly
different and you're trying to
exclude the non-aligning parts and
then generate a consensus?
So, does the first pair represent your data, or is it more like the second?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j, or using my to localize them properly.
Don't store your values as an array, store as a two-dimensional array:
my #dataset = ([$val1, $val2], [$val3, $val4]);
or
my #dataset;
push (#dataset, [$val_n1, $val_n2]);
Then:
for my $value (#dataset) {
### Do stuff with $value->[0] and $value->[1]
}
There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)
Here's a guess.
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = #_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;
Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.
Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.
One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.
It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my #data = read_data_file( $DATA_FILE );
my #matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my #data = <$infile>;
chomp #data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return #data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my #matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return #matrix;
}
Now that the code rewrite is done, here are some links:
Here's a response I wrote titled "How to write a program". It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.
For a beginning programmer, beginning with Perl, there is no better book than Learning Perl.
I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.
Good luck!
Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# stuff...
}
I've cleaned up some of your other code below:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my #data = <$infile>;
close $infile;
chomp #data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}