Simple multi-dimensional array with loop in perl - perl

I'm trying to use an array and a loop to print out the following (basically for each letter of the alphabet, print each letter of the alphabet after it and then move on to the next letter). I'm new to perl, anyone have any quick words of :
aa
ab
ac
ad
...
ba
bb
bc
bd
...
ca
cb
...
Currently I have this, but it only prints a single character alphabet...
#arr = ("a","b","c","d","e","f","g","h","i","j","k","l","m","n","o","p","q","r","s","t","u","v","w","x","y","z");
$i = #arr;
while ($i)
{
print $arr[$i];
$i--;
}

Using the range operator and the ranges you want to target:
use strict;
use warnings;
my #elements = ("aa" .. "zz");
for my $combo (#elements)
{
print "$combo\n";
}
You can utilize the initial 2 letters till the ending 2 letters you want as ending and the for will take care of everything.

This really isn't multi-dimensional array work, if it were you'd be working with stuff like:
my #foo = (
[1,2,3],
[4,7,8,1,2,3],
[2,3],
);
This is really a very basic how do I make a nested loop that iterates over the same array. I'll bet this is homework.
So, I'll let you figure out the nesting bits, but give some help with Perl's loop operators.
!! for/foreach
for (the each is optional) is the real heavy hitter for looping in perl. Use it like so:
for my $var ( #array ) {
#do stuff with $var
}
Each element in #array will be aliased to the $var variable, and the block of code will be executed. The fact that we are aliasing, rather than copying means that if alter the value of $var, #array will be changed as well. The stuff between the parenthesis may be any expression. The expression will be evaluated in list context. So if you put a file handle in the parens, the entire file will be read into memory and processed.
You can also leave off naming the loop variable, and $_ will be used instead. In general, DO NOT DO THIS.
!! C-Style for
Every once in a while you need to keep track of indexes as you loop over an array. This is when a C style for loop comes in handy.
for( my $i=0; $i<#array; $i++ ) {
# do stuff with $array[$i]
}
!! While/Until
While and until operate with boolean loop conditions. That means that the loop will repeat as long as the appropriate boolean value if found for the condition ( TRUE for while, and FALSE for until). In addition to the obvious cases where you are looking for a particular condition, while is great for processing a file one line at a time.
while ( my $line = <$fh> ) {
# Do stuff with $line.
}
!! map
map is an amazingly useful bit of functional programming kung-fu. It is used to turn one list into another. You pass an anonymous code reference that is used to enact the transformation.
# Multiply all elements of #old by two and store them in #new.
my #new = map { $_ * 2 } #old;
So how do you solve your particular problem? There are many ways. Which is best depends on how you want to use the results. If you want to create a new array of the letter pairs, use map. If you are interested primarily in a side effect (say printing a variable) use for. If you need to work with really big lists that come from sort of interator (like lines from a filehandle) use while.
Here's a solution. I wouldn't turn it in to your professor until you understand how it works.
print map { my $letter=$_; map "$letter$_\n", "a".."z" } "a".."z";
Look at perldoc articles, perlsyn for info on the looping constructs, perlfunc for info on map and look at perlop for info on the range operator (..).
Good luck.

Use the range operator (..) for your initialization. The range operator basically grabs a range of values such as numbers or characters.
Then use a nested loop to go through the array one time per character for a total of 26^2 iterations.
Rather than a while loop I've used a foreach loop to go through each item in the array. You could also put 'a' .. 'z' instead of declared #arr as the argument to the foreach loop. The foreach loops below set $char or $char2 to each value in #arr in turn.
my #arr = ('a' .. 'z');
for my $char (#arr) {
for my $char2 (#arr) {
print "$char$char2\n";
}
}

If all you really want to do is print the 676 strings you describe, then:
#!/usr/bin/perl
use warnings;
use strict;
my $str = 'aa';
while (length $str < 3) {
print $str++, "\n";
}
But I smell an "XY problem"...

Related

Perl: If two elements match print elements, else iterate until match and then print

I'm new to Perl and I'm trying to iterate over two elements of an array with multiple indices in each element and look for a match. If element2 matches element1, I want to print both and move to the next position in element1 and continue the loop looking for the next match. If I don't have a match, loop until I get a match. Here is what I have:
#array = split(',',$row);
foreach $element1(#array[1])
{
foreach $element2(#array[2])
{
if($element1 == $element2)
{
print "1 = $element1 : 2 = $element2 \n";
}
}
}
I'm not getting the the matched output. I've tried multiple iterations with different syntactical changes.
I can get both elements when I do this:
foreach $element1(#array[1])
{
foreach $element2(#array[2])
{
print "1 = $element1 : 2 = $element2 \n";
}
}
I thought I might not be dereferencing correctly. Any guidance or suggestions would be appreciated. Thanks.
There are a number of issues with your script. Briefly:
You should always use strict and warnings.
Array indices start at 0, not 1.
You get an element of an array with $array[0], not #array[0]. This is a common frustration for new Perl programmers. The thing to remember is that the sigil (the symbol preceding a variable name) indicates the type of value being passed (e.g. $scalar, #array, or %hash) to the left-hand side of the expression, not the type of datastructure being accessed on the right-hand side.
As #sp-asic pointed out in the comments on the OP, string comparisons are performed with eq, not ==.
References to datastructures are stored in scalars, and you dereference by prepending the sigil of the original datastructure. If $foo is a reference to an array, #$foo gets you the original array.
You apparently want to break out of your inner loops when you find a match, but you'll want to make it clear (for people who look at this code in the future, which may include yourself) which loop you're breaking out of.
Most critically, #array will be an array of strings after you split another string (the row) on commas, so it's not clear why you expect to be able to treat the strings in the first and second position as arrays that you can loop through. I have a few guesses about what you're actually trying to do, and what your inputs and expected outputs actually look like, but I'll wait for you to provide some additional information and leave the information above as general guidance in the meantime, along with a lightly-reworked version of your code below.
use strict;
use warnings;
my #array = split(',', $row);
foreach my $element1 (#$array[0]) {
foreach my $element2 (#$array[1]) {
if ($element1 eq $element2) {
print "1 = $element1 : 2 = $element2\n";
last;
}
}
}

How can I determine if an element exists in an array (perl)

I'm looping through an array, and I want to test if an element is found in another array.
In pseudo-code, what I'm trying to do is this:
foreach $term (#array1) {
if ($term is found in #array2) {
#do something here
}
}
I've got the "foreach" and the "do something here" parts down-pat ... but everything I've tried for the "if term is found in array" test does NOT work ...
I've tried grep:
if grep {/$term/} #array2 { #do something }
# this test always succeeds for values of $term that ARE NOT in #array2
if (grep(/$term/, #array2)) { #do something }
# this test likewise succeeds for values NOT IN the array
I've tried a couple different flavors of "converting the array to a hash" which many previous posts have indicated are so simple and easy ... and none of them have worked.
I am a long-time low-level user of perl, I understand just the basics of perl, do not understand all the fancy obfuscated code that comprises 99% of the solutions I read on the interwebs ... I would really, truly, honestly appreciate any answers that are explicit in the code and provide a step-by-step explanation of what the code is doing ...
... I seriously don't grok $_ and any other kind or type of hidden, understood, or implied value, variable, or function. I would really appreciate it if any examples or samples have all variables and functions named with clear terms ($term as opposed to $_) ... and describe with comments what the code is doing so I, in all my mentally deficient glory, may hope to possibly understand it some day. Please. :-)
...
I have an existing script which uses 'grep' somewhat succesfully:
$rc=grep(/$term/, #array);
if ($rc eq 0) { #something happens here }
but I applied that EXACT same code to my new script and it simply does NOT succeed properly ... i.e., it "succeeds" (rc = zero) when it tests a value of $term that I know is NOT present in the array being tested. I just don't get it.
The ONLY difference in my 'grep' approach between 'old' script and 'new' script is how I built the array ... in old script, I built array by reading in from a file:
#array=`cat file`;
whereas in new script I put the array inside the script itself (coz it's small) ... like this:
#array=("element1","element2","element3","element4");
How can that result in different output of the grep function? They're both bog-standard arrays! I don't get it!!!! :-(
########################################################################
addendum ... some clarifications or examples of my actual code:
########################################################################
The term I'm trying to match/find/grep is a word element, for example "word123".
This exercise was just intended to be a quick-n-dirty script to find some important info from a file full of junk, so I skip all the niceties (use strict, warnings, modules, subroutines) by choice ... this doesn't have to be elegant, just simple.
The term I'm searching for is stored in a variable which is instantiated via split:
foreach $line(#array1) {
chomp($line); # habit
# every line has multiple elements that I want to capture
($term1,$term2,$term3,$term4)=split(/\t/,$line);
# if a particular one of those terms is found in my other array 'array2'
if (grep(/$term2/, #array2) {
# then I'm storing a different element from the line into a 3rd array which eventually will be outputted
push(#known, $term1) unless $seen{$term1}++;
}
}
see that grep up there? It ain't workin right ... it is succeeding for all values of $term2 even if it is definitely NOT in array2 ... array1 is a file of a couple thousand lines. The element I'm calling $term2 here is a discrete term that may be in multiple lines, but is never repeated (or part of a larger string) within any given line. Array2 is about a couple dozen elements that I need to "filter in" for my output.
...
I just tried one of the below suggestions:
if (grep $_ eq $term2, #array2)
And this grep failed for all values of $term2 ... I'm getting an all or nothing response from grep ... so I guess I need to stop using grep. Try one of those hash solutions ... but I really could use more explanation and clarification on those.
This is in perlfaq. A quick way to do it is
my %seen;
$seen{$_}++ for #array1;
for my $item (#array2) {
if ($seen{$item}) {
# item is in array2, do something
}
}
If letter case is not important, you can set the keys with $seen{ lc($_) } and check with if ($seen{ lc($item) }).
ETA:
With the changed question: If the task is to match single words in #array2 against whole lines in #array1, the task is more complicated. Trying to split the lines and match against hash keys will likely be unsafe, because of punctuation and other such things. So, a regex solution will likely be the safest.
Unless #array2 is very large, you might do something like this:
my $rx = join "|", #array2;
for my $line (#array1) {
if ($line =~ /\b$rx\b/) { # use word boundary to avoid partial matches
# do something
}
}
If #array2 contains meta characters, such as *?+|, you have to make sure they are escaped, in which case you'd do something like:
my $rx = join "|", map quotemeta, #array2;
# etc
You could use the (infamous) "smart match" operator, provided you are on 5.10 or later:
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
print "a in \#array1\n" if 'a' ~~ #array1;
print "z in \#array1\n" if 'z' ~~ #array1;
print "z in \#array2\n" if 'z' ~~ #array2;
The example is very simple, but you can use an RE if you need to as well.
I should add that not everyone likes ~~ because there are some ambiguities and, um, "undocumented features". Should be OK for this though.
This should work.
#!/usr/bin/perl
use strict;
use warnings;
my #array1 = qw/a b c d e f g h/;
my #array2 = qw/a c e g z/;
for my $term (#array1) {
if (grep $_ eq $term, #array2) {
print "$term found.\n";
}
}
Output:
a found.
c found.
e found.
g found.
#!/usr/bin/perl
#ar = ( '1','2','3','4','5','6','10' );
#arr = ( '1','2','3','4','5','6','7','8','9' ) ;
foreach $var ( #arr ){
print "$var not found\n " if ( ! ( grep /$var/, #ar )) ;
}
Pattern matching is the most efficient way of matching elements. This would do the trick. Cheers!
print "$element found in the array\n" if ("#array" =~ m/$element/);
Your 'actual code' shouldn't even compile:
if (grep(/$term2/, #array2) {
should be:
if (grep (/$term2/, #array2)) {
You have unbalanced parentheses in your code. You may also find it easier to use grep with a callback (code reference) that operates on its arguments (the array.) It helps keep the parenthesis from blurring together. This is optional, though. It would be:
if (grep {/$term2/} #array2) {
You may want to use strict; and use warnings; to catch issues like this.
The example below might be helpful, it tries to see if any element in #array_sp is present in #my_array:
#! /usr/bin/perl -w
#my_array = qw(20001 20003);
#array_sp = qw(20001 20002 20004);
print "#array_sp\n";
foreach $case(#my_array){
if("#array_sp" =~ m/$case/){
print "My God!\n";
}
}
use pattern matching can solve this. Hope it helps
-QC
1. grep with eq , then
if (grep {$_ eq $term2} #array2) {
print "$term2 exists in the array";
}
2. grep with regex , then
if (grep {/$term2/} #array2) {
print "element with pattern $term2 exists in the array";
}

foreach loop with a condition perl?

Is it possible to have a foreach loop with a condition in Perl?
I'm having to do a lot of character by character processing - where a foreach loop is very convenient. Note that I cannot use some libraries since this is for school.
I could write a for loop using substr with a condition if necessary, but I'd like to avoid that!
You should show us some code, including the sort of thing you would like to do.
In general, character-by-character processing of a string would be done in Perl by writing
for my $ch (split //, $string) { ... }
or, if it is more convenient
my #chars = split //, $string;
for (#chars) { ... }
or
my $i = 0;
while ($i < #chars) { my $char = $chars[$i++]; ... }
and the latter form can support multiple expressions in the while condition. Perl is very rich with different ways to do the similar things, and without knowing more about your problem it is impossible to say which is best for you.
Edit
It is important to note that none of these methods allow the original string to be modified. If that is the intention then you must use s///, tr/// or substr.
Note that substr has a fourth parameter that will replace the specified part of the original string. Note also that it can act as an lvalue and so take an assignment. In other words
substr $string, 0, 1, 'X';
can be written equivalently as
substr($string, 0, 1) = 'X';
If split is used to convert a string into a list of characters (actually one-character strings) then it can be modified in this state and recombined into a string using join. For instance
my #chars = split //, $string;
$chars[0] = 'X';
$string = join '', #chars;
does a similar thing to the above code using substr.
For example:
foreach my $l (#something) {
last if (condition);
# ...
}
will exit the loop if condition is true
You might investigate the next and last directives. More info in perldoc perlsyn.

What does it mean when you try to print an array or hash using Perl and you get, Array(0xd3888)?

What does it mean when you try to print an array or hash and you see the following; Array(0xd3888) or HASH(0xd3978)?
EXAMPLE
CODE
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
foreach my $line (#data) {
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
RESULT
ARRAY(0xc1864)
ARRAY(0xd384c)
ARRAY(0xd3894)
ARRAY(0xd38d0)
ARRAY(0xd390c)
ARRAY(0xd3948)
It's hard to tell whether you meant it or not, but the reason why you're getting array references is because you're not printing what you think you are.
You started out right when iterating over the 'rows' of #data with:
foreach my $line (#data) { ... }
However, the next line is a no-go. It seems that you're confusing text strings with an array structure. Yes, each row contains strings, but Perl treats #data as an array, not a string.
split is used to convert strings to arrays. It doesn't operate on arrays! Same goes for chomp (with an irrelevant exception).
What you'll want to do is replace the contents of the foreach loop with the following:
foreach my $line (#data) {
print $line->[0].", ".$line->[1].", ".$line->[2]."\n";
}
You'll notice the -> notation, which is there for a reason. $line refers to an array. It is not an array itself. The -> arrows deference the array, allowing you access to individual elements of the array referenced by $line.
If you're not comfortable with the idea of deferencing with arrows (and most beginners usually aren't), you can create a temporary array as shown below and use that instead.
foreach my $line (#data) {
my #random = #{ $line };
print $random[0].", ".$random[1].", ".$random[2]."\n";
}
OUTPUT
1_TEST, 1_T, 1_TESTER
2_TEST, 2_T, 2_TESTER
3_TEST, 3_T, 3_TESTER
4_TEST, 4_T, 4_TESTER
5_TEST, 5_T, 5_TESTER
6_TEST, 6_T, ^_TESTER
A one-liner might go something like print "#$_\n" for #data; (which is a bit OTT), but if you want to just print the array to see what it looks like (say, for debugging purposes), I'd recommend using the Data::Dump module, which pretty-prints arrays and hashes for you without you having to worry about it too much.
Just put use Data::Dump 'dump'; at beginning of your script, and then dump #data;. As simple as that!
It means you do not have an array; you have a reference to an array.
Note that an array is specified with round brackets - as a list; when you use the square bracket notation, you are creating a reference to an array.
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Illustrating the difference:
my #data = (
['1_TEST','1_T','1_TESTER'],
['2_TEST','2_T','2_TESTER'],
['3_TEST','3_T','3_TESTER'],
['4_TEST','4_T','4_TESTER'],
['5_TEST','5_T','5_TESTER'],
['6_TEST','6_T','^_TESTER']
);
# Original print loop
foreach my $line (#data)
{
chomp($line);
#random = split(/\|/,$line);
print "".$random[0]."".$random[1]."".$random[2]."","\n";
}
# Revised print loop
foreach my $line (#data)
{
my #array = #$line;
print "$array[0] - $array[1] - $array[2]\n";
}
Output
ARRAY(0x62c0f8)
ARRAY(0x649db8)
ARRAY(0x649980)
ARRAY(0x649e48)
ARRAY(0x649ec0)
ARRAY(0x649f38)
1_TEST - 1_T - 1_TESTER
2_TEST - 2_T - 2_TESTER
3_TEST - 3_T - 3_TESTER
4_TEST - 4_T - 4_TESTER
5_TEST - 5_T - 5_TESTER
6_TEST - 6_T - ^_TESTER
You're printing a reference to the hash or array, rather than the contents OF that.
In the particular code you're describing I seem to recall that Perl automagically makes the foreach looping index variable (my $line in your code) into an "alias" (a sort of reference I guess) of the value at each stage through the loop.
So $line is a reference to #data[x] ... which is, at each iteration, some array. To get at one of the element of #data[0] you'd need the $ sigil (because the elements of the array at #data[0] are scalars). However $line[0] is a reference to some package/global variable that doesn't exist (use warnings; use strict; will tell you that, BTW).
[Edited after Ether pointed out my ignorance]
#data is a list of anonymous array references; each of which contains a list of scalars. Thus you have to use the sort of explicit de-referencing I describe below:
What you need is something more like:
print ${$line}[0], ${$line}[1], ${$line}[2], "\n";
... notice that the ${xxx}[0] is ensuring that the xxx is derefenced, then indexing is performed on the result of the dereference, which is then extracted as a scalar.
I testing this as well:
print $$line[0], $$line[1], $$line[2], "\n";
... and it seems to work. (However, I think that the first form is more clear, even if it's more verbose).
Personally I chalk this up to yet another gotchya in Perl.
[Further editorializing] I still count this as a "gotchya." Stuff like this, and the fact that most of the responses to this question have been technically correct while utterly failing to show any effort to actually help the original poster, has once again reminded me why I shifted to Python so many years ago. The code I posted works, of course, and probably accomplishes what the OP was attempting. My explanation was wholly wrong. I saw the word "alias" in the `perlsyn` man page and remembered that there were some funky semantics somewhere there; so I totally missed the part that [...] is creating an anonymous reference. Unless you drink from the Perl Kool-Aid in deep drafts then even the simplest code cannot be explained.

Why does my Perl for loop exit early?

I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.
my open(infile, 'dnadata.txt');
my #data = < infile>;
chomp #data;
#print #data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my #matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.
I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(#data1);
my $last2 =pop(#data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "#data1\n";
I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck.
This is the test sequence if anyone is curious.
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF
First off, if you're going to work with sequence data, use BioPerl. Life will be so much easier. However...
Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like #data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.
What I'm not clear on what you're trying to do is:
a) are you generating a consensus
sequence where the 2 sequences are
difference only by gaps
b) are your 2 sequences significantly
different and you're trying to
exclude the non-aligning parts and
then generate a consensus?
So, does the first pair represent your data, or is it more like the second?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j, or using my to localize them properly.
Don't store your values as an array, store as a two-dimensional array:
my #dataset = ([$val1, $val2], [$val3, $val4]);
or
my #dataset;
push (#dataset, [$val_n1, $val_n2]);
Then:
for my $value (#dataset) {
### Do stuff with $value->[0] and $value->[1]
}
There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)
Here's a guess.
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = #_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;
Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.
Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.
One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.
It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my #data = read_data_file( $DATA_FILE );
my #matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my #data = <$infile>;
chomp #data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return #data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my #matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return #matrix;
}
Now that the code rewrite is done, here are some links:
Here's a response I wrote titled "How to write a program". It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.
For a beginning programmer, beginning with Perl, there is no better book than Learning Perl.
I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.
Good luck!
Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# stuff...
}
I've cleaned up some of your other code below:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my #data = <$infile>;
close $infile;
chomp #data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}