using printf to create columnar data - perl

I am new to perl and scripting in general. I have five variables that hold data and I need to print them as five columns next to each other. Here is the code I have now.
$i = 0;
foreach $line (<inf>){
chomp $line;
#line=split / +/, $line;
$i = $i + 1;
if ($i > $n+1) {
$i = 1;
$numdata = $numdata + 1;
}
if ($i == 1) {
printf "%20s\n", $n, "\n";
} else {
print $i-1, "BEAD", $line[$col], $line[$col+1], $line[$col+2], "\n";
}
# other statistics
}
The output I get from this looks like:
5
1BEAD0.00000e+000.00000e+000.00000e+00
2BEAD0.00000e+000.00000e+000.00000e+00
3BEAD0.00000e+000.00000e+000.00000e+00
4BEAD0.00000e+000.00000e+000.00000e+00
5BEAD0.00000e+000.00000e+000.00000e+00
5
1BEAD9.40631e-02-3.53254e-022.09369e-01
2BEAD-6.69662e-03-3.13492e-012.62915e-01
3BEAD2.98822e-024.60254e-023.61680e-01
4BEAD-1.45631e-013.45979e-021.50167e-01
5BEAD-5.57204e-02-1.51673e-012.95947e-01
5
1BEAD8.14225e-028.10216e-022.76423e-01
2BEAD2.36992e-02-2.74023e-014.47334e-01
3BEAD1.23492e-011.12571e-012.59486e-01
4BEAD-2.05375e-011.25304e-011.85252e-01
5BEAD5.54441e-02-1.30280e-015.82256e-01
I have tried using "%6d %9d %15.6f %28.6f %39.6f\n" before the variables in my print statement to try to space the data out; however, this did not give me the columns I hoped for. Any help/ suggestions are appreciated.

If you're using Perl and doing more complex stuff, you may want to look into perlform, which is designed for this kind of thing, or a module like Text::Table.
As for using printf though, you can use the padding specifiers to get consistent spacing. For instance, using the Perl docs on it, make sure the field width is before the .: your printf string should probably look something more like this (check out the "precision, or maximum width" section):
printf "%6.d %9.d %15.6f %28.6f %39.6f"
Also, if your things are in an array, you can just pass the array the second argument to printf and save yourself typing everything out. I've also prepended the two other items from your example with unshift:
unshift(#line, $i-1, "BEAD");
printf "%6.d %10s %15.6f %28.6f %39.6f\n", $line;
Note that the %s placeholders don't have the . precision specifier, so leave it out for that. If you want the e-notation for the numbers, use %e or %g instead of %f (%39.6e).
Also, for Perl questions, always check out Perl Monks - much of this answer was culled from a question there.
P.S. Given one of your example columns, here's the proof-of-concept script I tried to make sure everything worked:
perl -e '#line = (8.14225e-02,8.10216e-02,2.76423e-01);
unshift(#line, 4, "BEAD");
printf "%6.d %10s %15.6f %28.6f %39.6e\n", #line;'

Related

print function in Perl

This is a perl code I use for compiling pressure data.
$data_ct--;
mkdir "365Days", 0777 unless -d "365Days";
my $file_no = 1;
my $j = $num_levels;
for ($i = 0; $i < $data_ct; $i++) {
if ($j == $num_levels) {
close OUT;
$j = 0;
my $file = "365days/wind$file_no";
$file_no++;
open OUT, "> $file" or die "Can't open $file: $!";
}
{
$wind_direction = (270-atan2($vwind[$i], $uwind[$i])*(180/pi))%360;
}
$wind_speed = sqrt($uwind[$i]*$uwind[$i]+$vwind[$i]*$vwind[$i]);
printf OUT "%.0f %.0f %.1f\n", $level[$i], $wind_direction, $wind_speed;
$j++;
}
$file_no--;
print STDERR "Wrote out $file_no wind files.\n";
print STDERR "Done\n";
The problem I am having is when it prints out the numbers, I want it to be in this format
Level Wind direction windspeed
250 320 1.5
870 56 4.6
Right now when I run the script the columns names do not show up rather just the numbers. Can someone direct me as to how to rectify the script?
There are several ways to do this in Perl. First, Perl has built in form ability. It's been a part of Perl since version 3.0 (about 20 years old). However, it is rarely used. In fact, it is so rarely used I am not even going to attempt to write an example with it because I'd have to spend way too much time relearning it. It's there and documented.
You can try to figure it out for yourself. Or, maybe some old timer Perl programmer might wake up from his nap and help out. (All bets are off if it's meatloaf night at the old age home, though).
Perl has evolved greatly in the last few decades, and this old forms bit represents a much older way of writing Perl programs. It just isn't pretty.
Another way this can be done and is more popular is to use the printf function. If you're not familiar with C and printf from there, it can be a bit intimidating to use. It depends upon formatting codes (the things that start with % to specify what you want to print (strings, integers, floating point numbers, etc.), and how you want those values formatted.
Fortunately, printf is so useful, that most programming languages have their own version of printf. Once you learn it, your knowledge is transferable to other places. There's an equivalent sprintf for setting variable with formats.
# Define header and line formats
my $header_fmt = "%-5.5s %-14.14s %-9.9s\n";
my $data_fmt = "%5d %14d %9.1f\n";
# Print my table header
printf $header_fmt, "Level", "Wind direction", "windspeed";
my $level = 230;
my $direction = 120;
my $speed = 32.3;
# Print my table data
printf $data_fmt, $level, $direction, $speed;
This prints out:
Level Wind direction windspeed
230 120 32.3
I like defining the format of my printed lines all together, so I can tweak them to get what I want. It's a great way to make sure your data line lines up with your header.
Okay, Matlock wasn't on tonight, so this crusty old Perl programmer has plenty of time.
In my previous answer, I said there was an old way of doing forms in Perl, but I didn't remember how it went. Well, I spent some time and got you an example of how it works.
Basically, you sort of need globalish variables. I thought you needed our variables for this to work, but I can get my variables to work if I define them on the same level as my format statements. It's not pretty.
You use GLOBS to define your formats with _TOP appended for your headers. Since I'm printing these on STDOUT, I define STDOUT_TOP for my heading and STDOUT for my data lines.
The format must start at the beginning of a column. The lone . on the end ends the format definition. You notice I write the entire thing with just a single write statement. How does it know to print out the heading? Perl tracks the number of lines printed and automatically writes a Form Feed character and a new heading when Perl thinks it's at the bottom of a page. I am assuming Perl uses 66 line pages as a default.
You can in Perl set your own form names via select. Perl uses $= as the number of lines on a page, and $- on the number of lines left. These variables are global, but are set by the selected format via the select statement. You can use IO::Handle for better variable naming.
#! /usr/bin/env perl
use strict;
use warnings;
use feature qw(say);
my #data = (
{
level => 250,
direction => 320,
speed => 1.5,
},
{
level => 870,
direction => 55,
speed => 4.5,
},
);
my $level;
my $direction;
my $speed;
for my $item_ref ( #data ) {
$level = $item_ref->{level};
$direction = $item_ref->{direction};
$speed = $item_ref->{speed};
write;
}
format STDOUT_TOP =
Level Wind Direction Windspeed
===== ============== =========
.
format STDOUT =
##### ############## ######.##
$level, $direction, $speed
.
This prints:
Level Wind Direction Windspeed
===== ============== =========
250 320 1.50
870 55 4.50
#Gunnerfan : Can you replace the line from your code as shown below
Your line of code: printf OUT "%.0f %.0f %.1f\n",$level[$i], wind_direction, $wind_speed;
Replacement code:
if($i==0) {
printf OUT "\n\t%s%-20s %-10s%-12s %-20s%s\n", 'Level' , 'Wind direction' , 'windspeed');
}
printf OUT "\t%s%-20s%s %-10s%s%-12s%s %-20s\n",$level[$i],$wind_direction, $wind_speed;

Perl: increment 2d array cell?

I have a set of numerical data for which is important to me to know what pairs of numbers occurred together, and how many times. Each set of data contain 7 numbers betwen 1 and 20. There are several hundred sets of data.
Essentially, by parsing each set of my data, I want to create a 20 x 20 array that I can use to keep a count of when pairs of numbers occurred together.
I have done a lot of searching, but maybe I've used the wrong key words. I've seen loads of examples how to create a "2D array" - I know perl doesn't actually do that, and that it's really an array of references - and to print the values contained therein, but nothing really on how to work with one particular cell by number and alter it.
Below is my conceptual code. The commented lines don't work, but illustrate what I want to achieve. I'm reasonably new to coding perl, and this just seems to advanced for me to understand the examples I've seen and translate it into something I can actually use.
my #datapairs;
while (<DATAFILE>)
{
chomp;
my #data = split(",",$_);
for ($prcount=0; $prcount <=5; $prcount++)
{
for ($othcount=($prcount+1); $othcount<=6; $othcount++)
{
#data[$prcount]=#data[$prcount]+1;
#data[$othcount]=#data[$othcount]+1;
#data[$prcount]=#data[$prcount]-1;
#data[$othcount]=#data[$othcount]-1;
print #data[$prcount]." ".#data[$othcount]."; ";
##datapairs[#data[$prcount]][#data[$othcount]]++;
##datapairs[#data[$othcount]][#data[$prcount]]++;
}
}
}
Any input or suggestions would be much appreciated.
To access a "cell" in a "2-d array" in Perl (as you alredy figured out, it's an array of arrayrefs), is simple:
my #datapairs;
# Add 1 for a pair with indexes $i and $j
$datapairs[$i]->[$j]++;
print that value
print "$datapairs[$i]->[$j]\n";
It's not clear what you mean by "occur together" - if you mean "in the same length-7 array", it's easy:
my #datapairs;
while (<DATAFILE>) {
chomp;
my #data = split(",", $_);
for (my $prcount = 0; $prcount <= 5; $prcount++) {
for (my $othcount = $prcount + 1; $othcount <=6 ; $othcount++) {
$datapairs[ $data[$prcount] ]->[ $data[$othcount] ]++;
}
}
}
# Print
for (my $i = 0; $i < 20; $i++) {
for (my $j = 0; $j < 20; $j++) {
print "$datapairs[$i]->[$j], ";
}
print "\n";
}
As a side note, personally, just for stylistic reasons, I strongly prefer to reference EVERYTHING, e.g. use arrayref of arrayrefs instead of array of arrays. E.g.
my $datapairs;
# Add 1 for a pair with indexes $i and $j
$datapairs->[$i]->[$j]++;
print that value
print "$datapairs->[$i]->[$j]\n";
The second (and third...) arrow dereference operator is optional in Perl but I personally find it significantly more readable to enforce its usage - it spaces out the index expressions.

command line pivot

I've been hunting around the past few days for a set of command line tools, a perl or an awk script that allow me to very quickly transpose the following data:
Row|Col|Val
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover
into this:
A|B|C
foo|bar|I have a real
bad||hangover
Note that there is only one value in the dataset for each "cell" (i.e., as with a spreadsheet, there aren't any duplicates of Row "1" Col "A")
I've tried various awk shell implementations for transposing data - but can't seem to get them working. One idea I had was to cut each "Col" value into a separate file, then use the "join" command line to put them back together by "Row" -- but there MUST be an easier way. I'm sure this is just incredibly simple to do - but I'm struggling a bit.
My input files have Cols A through G (mostly including variable length strings), and 10,000 Rows. If I can avoid loading everything into memory that would be a huge plus.
Beer-by-mail for anyone who's got the answer!
As always - many thanks in advance for your help.
Cheers,
Josh
p.s. - I'm a bit surprised that there isn't an out-of-the-box command line util for doing this very basic type of pivot/transposition operation. I looked at http://code.google.com/p/openpivot/ and at http://code.google.com/p/crush-tools/ both of which seem to require aggregate calcs.
I can do this in gawk, but not nawk.
#!/usr/local/bin/gawk -f
BEGIN {
FS="|";
}
{
rows[$1]=1; cols[$2]=1; values[$1][$2]=$3;
}
END {
for (col in cols) {
output=output sprintf("|%s", col);
}
print substr(output, 2);
for (row in rows) {
output="";
for (col in cols) {
output=output sprintf("|%s", values[row][col]);
}
print substr(output, 2);
}
}
And it even works:
ghoti#pc $ cat data
1|A|foo
1|B|bar
1|C|I have a real
2|A|bad
2|C|hangover
ghoti#pc $ ./doit.gawk data
A|B|C
foo|bar|I have a real
bad||hangover
ghoti#pc $
I'm not sure how well this will work with 10000 rows, but I suspect if you've got the memory for it, you'll be fine. I can't see how you can avoid loading things into memory except by storing things in separate files which you'd later join. Which is pretty much a manual implementation of virtual memory.
UPDATE:
Per comments:
#!/usr/local/bin/gawk -f
BEGIN {
FS="|";
}
{
rows[$1]=1; cols[$2]=1; values[$1,$2]=$3;
}
END {
for (col in cols) {
output=output sprintf("|%s", col);
}
print output;
for (row in rows) {
output="";
for (col in cols) {
output=output "|" values[row,col];
}
print row output;
}
}
And the output:
ghoti#pc $ ./doit.awk data
|A|B|C
1|foo|bar|I have a real
2|bad||hangover
ghoti#pc $
Just use a hash.
If you don't want to load them into memory, you may need modules like DBM::Deep and a DBM backend.
my %table;
my $maxa = 'A';
my $maxr = 0;
<>;
while (<>) {
chomp;
my ($a,$b,$c) = split /\|/;
$table{$a}->{$b} = $c;
$maxr = $a if ($a > $maxr);
$maxa = $b if ($b gt $maxa);
}
for (my $c = 'A' ; $c lt $maxa ; $c++) {
print $c . '|';
}
print "$maxa\n";
for (my $r = 1 ; $r <= $maxr ; $r++) {
for (my $c = 'A' ; $c lt $maxa ; $c++) {
print $table{$r}->{$c} . '|';
}
print $table{$r}->{$maxa} . "\n";
}
If you know Awk, I'd recommend you look at Perl. Perl is just much more powerful than Awk. The advantage is that if you know BASH/Bourne shell and Awk, much of the syntax in Perl will be familiar.
Another nice thing about Perl is the entire CPAN repository which allows you to download already written Perl modules to use in your program. A quick search in CPAN brought up Data::Pivot which looks like (at a very quick glance) it might do what you want.
If not, take a look at Acme::Tools pivot command. Or try one of the many others.
Others have already provided a few solutions, but I recommend you look at what the CPAN Perl archive has. It's a very powerful tool for things like this.

Parsing a log file using perl

I have a log file where some of the entries look like this:
YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC
and I'm trying to get it into a CSV format:
Date,Time,v1,v2,v3,v4,v5
YY/MM/DD,HH:MM:SS:MMM,XXX,YYY,ZZZ,AAA AND BBB,CCC
I'd like to do this in Perl - speaking personally, I could probably do it far quicker in other languages but I'd really like to expand my horizons a bit.
So far I can get as far as reading the file in and picking out only lines which meet my criteria but I can't seem to get the next stage done. I'll need to splice up the input line but so far I just can't work out how to do this. I've looked at s//and m// but they don't really give me what I want. If anyone can advise me how this can be done or give me pointers I'd much appreciate it.
Important points:
The values in the second part of the line are always in the same order so mapping / re-organising is not necesarily a problem.
Some of the fields have free text which is not quoted :( but as the labels all start v<number>= I'm hoping parsing this should still be a possibility.
Since there is no one delimiter, you'll need to try this a few different ways:
First, split on ' ', then take the first three values:
my #array = split / /, $line;
my ($date, $time, $constant) = splice #array, 0, 3;
Join the rest of the fields together again, and re-split on v\d+= to get the values:
my $rest = join ' ', #array;
# $rest should now be "v1=XXX v2=YYY ..."
my #values = split /\s*v\d+=/, $rest;
shift #values; # since the first element in #values will be empty
print join ',', $date, $time, #values;
Edit: Here's another approach that may be easier to follow, and is slightly more efficient. This takes advantage of the fact that your constant text occurs between the date/time and the value list.
# assume that CONSTANT is your constant text
my ($datetime, $valuelist) = split /\s*CONSTANT\s*/, $line;
my ($date, $time) = split / /, $datetime;
my #values = split /\s*v\d+=/, $valuelist;
shift #values;
print join ',', $date, $time, #values, "\n";
What have you tried with regular expressions and how has it failed? A regex with m// works fine for me:
#!/usr/bin/env perl
use strict;
use warnings;
print "Date,Time,v1,v2,v3,v4,v5\n";
while (my $line = <DATA>) {
my #matched = $line =~ m{^([^ ]+) ([^ ]+).*v1=(.*) v2=(.*) v3=(.*) v4=(.*) v5=(.*)};
print join(',', #matched), "\n";
}
__DATA__
YY/MM/DD HH:MM:SS:MMM <Some constant text> v1=XXX v2=YYY v3=ZZZ v4=AAA AND BBB v5=CCC
Two caveats:
1) v1 cannot contain the substring " v2=", v2 cannot contain " v3=", etc., but, with such a loose format, that's something that would likely cause problems for a human attempting to parse it, too.
2) This code assumes that there will always be v1 through v5. If there are fewer than five v*n* fields, the line will fail to match. If there are more, all additional fields will be appended to v5 (including their v*n* tags).
In case the log is fixed-width, you better off using unpack, you will see its benefits if the log grows very large (performance wise).

Why does my Perl for loop exit early?

I am trying to get a perl loop to work that is working from an array that contains 6 elements. I want the loop to pull out two elements from the array, perform certain functions, and then loop back and pull out the next two elements from the array until the array runs out of elements. Problem is that the loop only pulls out the first two elements and then stops. Some help here would be greatly apperaciated.
my open(infile, 'dnadata.txt');
my #data = < infile>;
chomp #data;
#print #data; #Debug
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
my $i=0;
my $j=0;
my #matrix =();
for(my $i=0; $i<2; $i++){
for( my $j=0; $j<$aalen; $j++){
$matrix[$i][$j] = 0;
}
}
The guidelines for this program states that the program should ignore the presence of gaps in the program. which means that DNA code that is matched up with a gap should be ignored. So the code that is pushed through needs to have alignments linked with gaps removed.
I need to modify the length of the array by two since I am comparing two sequence in this part of the loop.
#$lemseqcomp = $lenarray / 2;
#print $lenseqcomp;
#I need to initialize these saclar values.
$junk1 = " ";
$junk2 = " ";
$seq1 = " ";
$seq2 = " ";
This is the loop that is causeing issues. I belive that the first loop should move back to the array and pull out the next element each time it loops but it doesn't.
for($i=0; $i<$lenarray; $i++){
#This code should remove the the last value of the array once and
#then a second time. The sequences should be the same length at this point.
my $last1 =pop(#data1);
my $last2 =pop(#data1);
for($i=0; $i<length($last1); $i++){
my $letter1 = substr($last1, $i, 1);
my $letter2 = substr($last2, $i, 1);
if(($letter1 eq '-')|| ($letter2 eq '-')){
#I need to put the sequences I am getting rid of somewhere. Here is a good place as any.
$junk1 = $letter1 . $junk1;
$junk2 = $letter1 . $junk2;
}
else{
$seq1 = $letter1 . $seq1;
$seq2 = $letter2 . $seq2;
}
}
}
print "$seq1\n";
print "$seq2\n";
print "#data1\n";
I am actually trying to create a substitution matrix from scratch and return the data. The reason why the code looks weird, is because it isn't actually finished yet and I got stuck.
This is the test sequence if anyone is curious.
YFRFR
YF-FR
FRFRFR
ARFRFR
YFYFR-F
YFRFRYF
First off, if you're going to work with sequence data, use BioPerl. Life will be so much easier. However...
Since you know you'll be comparing the lines from your input file as pairs, it makes sense to read them into a datastructure that reflects that. As elsewhere suggested, an array like #data[[line1, line2],[line3,line4]) ensures that the correct pairs of lines are always together.
What I'm not clear on what you're trying to do is:
a) are you generating a consensus
sequence where the 2 sequences are
difference only by gaps
b) are your 2 sequences significantly
different and you're trying to
exclude the non-aligning parts and
then generate a consensus?
So, does the first pair represent your data, or is it more like the second?
ATCG---AAActctgGGGGG--taGC
ATCGcccAAActctgGGGGGTTtaGC
ATCG---AAActctgGGGGG--taGCTTTTTTTTTTTTTTTTTTTTTTTTTTTTTT
ATCGcccAAActctgGGGGGTTtaGCGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
The problem is that you're using $i as the counter variable for both your loops, so the inner loop modifies the counter out from under the outer loop. Try changing the inner loop's counter to $j, or using my to localize them properly.
Don't store your values as an array, store as a two-dimensional array:
my #dataset = ([$val1, $val2], [$val3, $val4]);
or
my #dataset;
push (#dataset, [$val_n1, $val_n2]);
Then:
for my $value (#dataset) {
### Do stuff with $value->[0] and $value->[1]
}
There are lots of strange things in your code: you are initializing a matrix then not using it; reading a whole file into an array; scanning a string C style but then not doing anything with the unmatched values; and finally, just printing the two last processed values (which, in your case, are the two first elements of your array, since you are using pop.)
Here's a guess.
use strict;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
# Preparing a regular expression. This is kind of useful if processing large
# amounts of data. This will match anything that is not in the string above.
my $regex = qr([^$aminoacids]);
# Our work function.
sub do_something {
my ($a, $b) = #_;
$a =~ s/$regex//g; # removing unwanted characters
$b =~ s/$regex//g; # ditto
# Printing, saving, whatever...
print "Something: $a - $b\n";
return ($a, $b);
}
my $prev;
while (<>) {
chomp;
if ($prev) {
do_something($prev, $_);
$prev = undef;
} else {
$prev = $_;
}
}
print STDERR "Warning: trailing data: $prev\n"
if $prev;
Since you are a total Perl/programming newbie, I am going to show a rewrite of your first code block, then I'll offer you some general advice and links.
Let's look at your first block of sample code. There is a lot of stuff all strung together, and it's hard to follow. I, personally, am too dumb to remember more than a few things at a time, so I chop problems into small pieces that I can understand. This is (was) known as 'chunking'.
One easy way to chunk your program is use write subroutines. Take any particular action or idea that is likely to be repeated or would make the current section of code long and hard to understand, and wrap it up into a nice neat package and get it out of the way.
It also helps if you add space to your code to make it easier to read. Your mind is already struggling to grok the code soup, why make things harder than necessary? Grouping like things, using _ in names, blank lines and indentation all help. There are also conventions that can help, like making constant values (values that cannot or should not change) all capital letters.
use strict; # Using strict will help catch errors.
use warnings; # ditto for warnings.
use diagnostics; # diagnostics will help you understand the error messages
# Put constants at the top of your program.
# It makes them easy to find, and change as needed.
my $AMINO_ACIDS = 'ARNDCQEGHILKMFPSTWYV';
my $AMINO_COUNT = length($AMINO_ACIDS);
my $DATA_FILE = 'dnadata.txt';
# Here I am using subroutines to encapsulate complexity:
my #data = read_data_file( $DATA_FILE );
my #matrix = initialize_matrix( 2, $amino_count, 0 );
# now we are done with the first block of code and can do more stuff
...
# This section down here looks kind of big, but it is mostly comments.
# Remove the didactic comments and suddenly the code is much more compact.
# Here are the actual subs that I abstracted out above.
# It helps to document your subs:
# - what they do
# - what arguments they take
# - what they return
# Read a data file and returns an array of dna strings read from the file.
#
# Arguments
# data_file => path to the data file to read
sub read_data_file {
my $data_file = shift;
# Here I am using a 3 argument open, and a lexical filehandle.
open( my $infile, '<', $data_file )
or die "Unable to open dnadata.txt - $!\n";
# I've left slurping the whole file intact, even though it can be very inefficient.
# Other times it is just what the doctor ordered.
my #data = <$infile>;
chomp #data;
# I return the data array rather than a reference
# to keep things simple since you are just learning.
#
# In my code, I'd pass a reference.
return #data;
}
# Initialize a matrix (or 2-d array) with a specified value.
#
# Arguments
# $i => width of matrix
# $j => height of matrix
# $value => initial value
sub initialize_matrix {
my $i = shift;
my $j = shift;
my $value = shift;
# I use two powerful perlisms here: map and the range operator.
#
# map is a list contsruction function that is very very powerful.
# it calls the code in brackets for each member of the the list it operates against.
# Think of it as a for loop that keeps the result of each iteration,
# and then builds an array out of the results.
#
# The range operator `..` creates a list of intervening values. For example:
# (1..5) is the same as (1, 2, 3, 4, 5)
my #matrix = map {
[ ($value) x $i ]
} 1..$j;
# So here we make a list of numbers from 1 to $j.
# For each member of the list we
# create an anonymous array containing a list of $i copies of $value.
# Then we add the anonymous array to the matrix.
return #matrix;
}
Now that the code rewrite is done, here are some links:
Here's a response I wrote titled "How to write a program". It offers some basic guidelines on how to approach writing software projects from specification. It is aimed at beginners. I hope you find it helpful. If nothing else, the links in it should be handy.
For a beginning programmer, beginning with Perl, there is no better book than Learning Perl.
I also recommend heading over to Perlmonks for Perl help and mentoring. It is an active Perl specific community site with very smart, friendly people who are happy to help you. Kind of like Stack Overflow, but more focused.
Good luck!
Instead of using a C-style for loop, you can read data from an array two elements at a time using splice inside a while loop:
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# stuff...
}
I've cleaned up some of your other code below:
use strict;
use warnings;
open(my $infile, '<', 'dnadata.txt');
my #data = <$infile>;
close $infile;
chomp #data;
my $aminoacids = 'ARNDCQEGHILKMFPSTWYV';
my $aalen = length($aminoacids);
# initialize a 2 x 21 array for holding the amino acid data
my $matrix;
foreach my $i (0 .. 1)
{
foreach my $j (0 .. $aalen-1)
{
$matrix->[$i][$j] = 0;
}
}
# Process all letters in the DNA data
while (my ($letter1, $letter2) = splice(#data, 0, 2))
{
# do something... not sure what?
# you appear to want to look up the letters in a reference table, perhaps $aminoacids?
}