A little help with loops on perl - perl

I am having trouble specifiying the correct algorithm. I am iterating over an input file with loops. The issue that I have is on the last loop.
#!/usr/bin/perl
# Lab #4
# Judd Bittman
# http://www-users.cselabs.umn.edu/classes/Spring-2011/csci3003/index.php?page=labs
# this site has what needs to be in the lab
# lab4 is the lab instructions
# yeast protein is the part that is being read
use warnings;
use strict;
my $file = "<YeastProteins.txt";
open(my $proteins, $file);
my #identifier;
my #qualifier;
my #molecularweight;
my #pi;
while (my $line1 = <$proteins>) {
#print $line1;
chomp($line1);
my #line = split(/\t/, $line1);
push(#identifier, $line[0]);
push(#qualifier, $line[1]);
push(#molecularweight, $line[2]);
push(#pi, $line[3]);
}
my $extreme = 0;
my $ex_index = 0;
for (my $index = 1; $index < 6805; $index++) {
if ( defined($identifier[$index])
&& defined($qualifier[$index])
&& defined($molecularweight[$index])
&& defined($pi[$index])) {
# print"$identifier[$index]\t:\t$qualifier[$index]:\t$molecularweight[$index]:\n$pi[$index]";
}
if ( defined($identifier[$index])
&& defined($qualifier[$index])
&& defined($pi[$index])) {
if (abs($pi[$index] - 7) > $extreme && $qualifier[$index] eq "Verified")
{
$extreme = abs($pi[$index] - 7);
$ex_index = $identifier[$index];
print $extreme. " " . $ex_index . "\n";
}
}
}
print $extreme;
print "\n";
print $ex_index;
print "\n";
# the part above does part b of the assignment
# YLR204W,its part of the Mitochondrial inner membrane protein as well as a processor.
my $exindex = 0;
my $high = 0;
# two lines above and below is part c
# there is an error and I know there is something wrong
for (my $index = 1; $index < 6805; $index++) {
if ( defined($qualifier[$index])
&& ($qualifier[$index]) eq "Verified"
&& defined($molecularweight[$index])
&& (abs($molecularweight[$index]) > $high)) {
$high = (abs($molecularweight[$index]) > $high); # something wrong on this line, I know I wrote something wrong
$exindex = $identifier[$index];
}
}
print $high;
print "\n";
print $exindex;
print "\n";
close($proteins);
exit;
On the final loop I want my loop to hold on to the protein that is verified and has the highest molecular mass. This is in the input file. What code can I use to tell the program that I want to hold the highest number and its name? I feel like I am very close but I can't put my finger on it.

First, a note about perl - in general, it's more common to use foreach style loops rather than c-style indexed loops. For example:
for my $protein (#proteins) {
#do something with $p
}
(Your situation might require it, I just thought I'd mention this)
To address your specific issue though:
$high = (abs($molecularweight[$index])>$high);
$high is being set to the result of the boolean test being performed. Remove the >$high part (which is being tested in your if statement) and you'll likely end up with what you intended.

You likely want a more complex data structure, such as a nested hash. It's hard to give a solid example without more knowledge of the data, but, say your first identifier were abc, the second one was def, etc:
my %protein_entries = (
abc => {
qualifier => 'something',
molecular_weight => 1234,
pi => 'something',
},
def => {
qualifier => 'something else',
molecular_weight => 5678,
pi => 'something else',
},
# …
);
Then, rather than having several different arrays and keeping track of which belongs to which, you get at the elements like so:
Then, if you want to get at the highest by molecular weight, you can sort the identifiers by their molecular weight, then slice off the highest one:
my $highest = (sort {
$protein_entries{$a}{molecular_weight}
<=>
$protein_entries{$b}{molecular_weight}
} keys %protein_entries)[1];
You're having problem with your algorithm because you're not structuring your data properly, basically.
In this example, $highest will hold def, then later you can go back and fetch $protein_entries{def}{molecular_weight} or any of the other keys in the anonymous hash referenced by $protein_entries{def}, thus being easily able to recall any relevant associated data.

Just change:
$high = (abs($molecularweight[$index]) > $high);
To this:
$high = abs($molecularweight[$index]) if (abs($molecularweight[$index]) > $high);
At the end of the loop, $high will be the highest value in $molecularweight array.

Related

How can I test if a value is repeated in array?

I'm doing a subroutine that takes a list of numbers as an argument. What I would like to do is check if there is a repeated value in that list. In case there are repeated numbers, print a message and stop the program. In case there are no repeated numbers, continue with the execution.
For example:
if (there_is_number_repeated){
print "There is a number repeated";}
else{
run this code...}
I was trying to do this creating a hash with the values of that list, and then check if there are values > 1.
use strict;
use warnings;
sub name_subroutine{
my (#numbers)=#_;
my $n=scalar(#numbers);
my %table=();
foreach my $i(#numbers){
if (exists $tabla{$i}){
$tabla{$i}+=1;}
else{
$tabla{$i} = 1;
}
my #values = values %tabla;
}
}
It's here where I do not know to continue. Is there any way to do this in an amateurish way? I'm newbie in Perl.
Thanks!
I would just do:
my %uniq;
if ( grep ++$uniq{$_} > 1, #numbers ) {
# some numbers are repeated
}
In your existing code (with a couple corrections):
my %table=();
foreach my $i(#numbers){
if (exists $table{$i}){
$table{$i}+=1;}
else{
$table{$i} = 1;
}
}
my #values = values %table;
you don't need to check for exists; doing += 1 or ++ will set it to 1 if it didn't exist. And you don't want the values (those are just the counts of how many times each array value was found), you want the keys, specifically those for which the value is > 1:
my #repeated = grep $table{$_} > 1, keys %table;
my #arr = #_;
my $count = #arr;
for(my $i=0;$i<$count;$i++)
{
my $num = $arr[$i];
for(my $j=0; $j<$count,$j!=$i; $j++)
{
if($num == $arr[$j])
{
print "\n$num is repeated";
last;
}
}
}
Tried and tested. Cheers.

How can I use map to clean up this Perl code?

The code below does what I want it to. It prints the list and adds an asterisk at the end of lines that are not sequential, e.g. if you skip from 1 to 3 or 3 to 5.
use strict;
use warnings;
#note: thanks to all who helped with formatting issues.
#note: I recognize a hash would be a much better option for what I want to do.
my #printy = ("1 -> this",
"5 -> that",
"3 -> the other",
"6 -> thus and such");
#printy = sort {num($a) <=> num($b)} #printy;
my $thisID = 0;
my $lastID = 0;
#print out (line)* if initial number is >1 more than previous, or just (line) otherwise
for (#printy)
{
$thisID = $_; $thisID =~s/ .*//g;
if ($thisID - $lastID != 1) { $_ =~ s/$/ \*/; }
$lastID = $thisID;
}
print join("\n", #printy) . "\n";
sub num
{
my $x = $_[0];
$x =~ s/ .*//;
return $x;
}
But I think I can do better. It feels tangled, and my intuition tells me I'm missing something powerful that could do the job more easily, one that takes maybe two lines.
Now I've used the map() command before, but only to look at/modify an element, not how it compares to a previous element. Can anyone recommend a way to make this more succinct? Thanks!
Since Perl promotes TIMTOWTDI, map may seem like an attractive option at first. Let's see how it fares for this task:
Schwartzian thought process
Since access to neighboring elements is necessary, it's convenient to work with the indices. Since for n elements, there are n-1 pairs of neighbors, you don't have to loop n times. In this case, let's start with 1 instead of the usual 0:
1 .. $#printy
One can access neighboring elements by calling the relevant indices inside map.
map { my $prev = $printy[$_-1]; my $curr = $printy[$_] } 1 .. $#printy;
An array slice expresses this more succinctly:
map { my ( $prev, $curr ) = #printy[$_-1,$_]; } 1 .. $#printy;
Let's introduce the real logic related to comparing numbers using the num subroutine:
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
if ( num($curr) - num($prev) > 1 ) {
"$curr *";
}
else {
$curr;
}
} 1 .. $#printy;
Which is equivalent to:
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
$curr .= " *" if num($curr) - num($prev) > 1;
$curr
} 1 .. $#printy;
Remember not to forget the first element:
#printy = ( $printy[0],
map {
my ( $prev, $curr ) = #printy[$_-1,$_];
$curr .= " *" if num($curr) - num($prev) > 1;
$curr
} 1 .. $#printy
);
Given the final result, I'm not so sure I'd use map for this:
It's hard to read
There's a lot going on
The next person working on your code will love you
No map needed, just add some spaces here and there, and remove stuff that's not needed ($_, join, etc.). Also, reuse num() inside the loop, no need to repeat the regex.
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
my #printy = sort { num($a) <=> num($b) }
'1 -> this', '5 -> that', '3 -> the other', '6 -> thus and such';
my $thisID = my $lastID = 0;
for (#printy) {
$thisID = num($_);
$_ .= ' *' if $thisID - $lastID != 1;
$lastID = $thisID;
}
say for #printy;
sub num {
my ($x) = #_;
$x =~ s/ .*//;
return $x;
}
Also, reimplementing num using /(\d+)/ instead of substitution might tell its purpose more clearly.
I agree with choroba that there is no need for a map here. But I'd refactor a little bit anyway.
use strict;
use warnings;
use feature 'say';
my #printy = ( "1 -> this", "5 -> that", "3 -> the other", "6 -> thus and such" );
my $last_id = 0;
foreach my $line ( sort { num($a) <=> num($b) } #printy ) {
my $current_id = num($line);
$line .= ' *' unless $current_id - $last_id == 1;
$last_id = $current_id;
}
say for #printy;
# returns the number at the start of a string
sub num {
$_[0] =~ m/^(\d+)/;
return $1;
}
I moved the sort down into the foreach, because you shouldn't rely on the fact that your input is sorted.
I changed the variable names to go with the convention that there should be no capital letters in variable names, and I used say, which is like print with a system-specific newline at the end.
I also moved the $current_id into the loop. That doesn't need to be visible outside because it's lexical to that loop. Always declare variables in the smallest possible scope.
You already had that nice num function, but you're not using it inside of the loop to get the $current_id. Use it.
I think if the input gets very long, it might make sense to go with a map construct because sorting will be very expensive at some point. Look at the Schwartzian transform for caching the calculation before sorting. You could then do everything at once. But it won't be readable for a beginner any more.
Your data yells "Use a hash!" to me.
If we had a hash,
my %printy =
map { split / -> / }
"1 -> this", "5 -> that", "3 -> the other", "6 -> thus and such";
The solution would simply be:
my #order = sort { $a <=> $b } keys(%printy);
for my $i (#order[1..$#order]) {
$printy{$i} .= ' *'
if !exists($printy{$i-1});
}
print "$_ -> $printy{$_}\n"
for #order;
This can be golfed down, though I'm not sure it's worth it.
my $count;
print "$_ -> $printy{$_}".( !$count++ || exists($printy{$_-1}) ? "" : " *" )."\n"
for
sort { $a <=> $b }
keys(%printy);
That for can be converted into a map, but it just makes it less efficient.
my $count;
print
map { "$_ -> $printy{$_}".( !$count++ || exists($printy{$_-1}) ? "" : " *" )."\n" }
sort { $a <=> $b }
keys(%printy);
I'd also advise to clean up the code and keep the loop. However, here is a map based way.
The code uses your sorted #printy and the num sub.
my #nums = map { num($_) } #printy;
my #res = map {
$nums[$_] == $nums[$_-1] + 1 # invariably false for $_ == 0
? $printy[$_] : $printy[$_] .= ' *';
}
(0..$#printy);
say for #res;
This works for the first element since it does not come after the last, given that we're sorted. That may be a bit diabolical though and it needs a comment in code. So perhaps better spell it out
my #res = map {
($nums[$_] == $nums[$_-1] + 1) ? $printy[$_] : $printy[$_] .= ' *';
}
(1..$#printy);
unshift #res, $printy[0];
Not as clean but clear.
All this does extra work when compared to a straight loop, of course.
I'm sorry, but your code is a shambles, and you need to do much more than use map to clean up this code
You have no indentation and multiple statements on a single line, and you haven't thought through your logic. Your code is unmaintainable
Here's how I would write this. It builds a parallel array of IDs, and then sorts a list of indices so that both the IDs and the original data are in order
If it makes you happier, it does include map
use strict;
use warnings 'all';
my #data = ( '1 -> this', '5 -> that', '3 -> the other', '6 -> thus and such' );
my #ids = map { /(\d+)/ } #data;
my #indexes = sort { $ids[$a] <=> $ids[$b] } 0 .. $#ids;
my $last_id;
for my $i ( #indexes ) {
print $data[$i];
print ' *' if defined $last_id and $ids[$i] > $last_id + 1;
print "\n";
$last_id = $ids[$i];
}
output
1 -> this
3 -> the other *
5 -> that *
6 -> thus and such

how to deal with graph theory related problems using perl

I want to learn how to solve simple graph theory problems in perl without using any extra module.
I can explain a simple problem.
Input format:
Line 1- number of vertices of graph-N.
Next N lines- index of vertices with direct connection to vertex with index i. Index starts from 1.
index of starting point (space) index of end point, find longest route possible.
Example
4
2 3 4
1
1 4
1 3
2 4
Solution:
2 to 4 can be reached in following ways
- 2-1-4
- 2-1-3-4
so longest path is 2-1-3-4
I want to learn the basics of using perl for such problems. Any help would be highly appreciated. Give me a hint and i will try to code.
I'd use a hash of hashes to represent a graph. $graph{$v1}{$v2} exists if the edge v1-v2 is in the graph. You can represent directed graphs this way (as $graph{$v2}{$v1} doesn't have to exist). Also, if you want weighted edges, you can store the weight as the value.
To solve your example problem, I'd use something like the following:
#!/usr/bin/perl
use warnings;
use strict;
use feature qw{ say };
# Check that vertex can be added to the path.
sub already {
my ($vertex, #vertices) = #_;
for my $i (1 .. $#vertices) {
# last-v or v-last might already be present.
return 1 if ($vertices[ $i - 1 ] == $vertices[-1] and $vertices[$i] == $vertex)
or ($vertices[ $i - 1 ] == $vertex and $vertices[$i] == $vertices[-1])
}
return
}
sub path {
my ($graph, $start, $end, %known) = #_;
my $count = keys %known;
for my $path (keys %known) {
my #vertices = split '-', $path;
next if $vertices[-1] == $end;
for my $target (keys %{ $graph->{ $vertices[-1] } }) {
undef $known{"$path-$target"} unless already($target, #vertices);
}
}
if (keys %known > $count) {
return path($graph, $start, $end, %known)
} else {
return keys %known
}
}
my %graph;
my $size = <>;
for my $node (1 .. $size) {
my #targets = split ' ', <>;
undef $graph{$node}{$_} for #targets;
}
my ($start, $end) = split ' ', <>;
say "$start to $end can be reached in the following ways";
my #paths = grep /-$end$/,
path(\%graph, $start, $end, map {; "$start-$_" => undef }
keys %{ $graph{$start} });
say for #paths;
my $max = 0;
for my $i (1 .. $#paths) {
$max = $i if ($paths[$i] =~ tr/-//) > ($paths[$max] =~ tr/-//);
}
say "so longest path is $paths[$max]";

input a number and output the Fibonacci number recursively Perl

For a given value N I am trying to output the corresponding Fibonacci number F(N). My script doesn't seem to enter the recursive stage. fibonnaci($number) is not calling the subroutine. It is simply outputing "fibonacci(whatever number is inputted)".
Here is my code:
#!/usr/bin/perl -w
use warnings;
use strict;
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
sub fibonacci
{
my $f;
if ( $number == 0 ) { # base case
$f = 0;
} elsif ( $number == 1 ) {
$f = 1;
} else { # recursive step
$f = fibonacci( $number - 1 ) + fibonacci( $number - 2 );
}
return $f;
}
print "\nf($number) = fibonacci($number)\n";
Sample Output:
Please enter value of N: 4
f(4) = fibonacci(4)
user1:~>recursiveFib.pl
Please enter value of N: 5
f(5) = fibonacci(5)
user1:~>recursiveFib.pl
Please enter value of N: 10
f(10) = fibonacci(10)
user1:~>
Not sure where I went wrong. Any help would be greatly appreciated.
You need to accept the function arguments properly and take the function call out of the quotes.
use warnings;
use strict;
sub fibonacci {
my ($number) = #_;
if ($number < 2) { # base case
return $number;
}
return fibonacci($number-1) + fibonacci($number-2);
}
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
print "\n$number: ", fibonacci($number), "\n";
A more efficient but still recursive version:
sub fib_r {
my ($n,$a,$b) = #_;
if ($n <= 0) { return $a; }
else { return fib_r($n-1, $b, $a+$b); }
}
sub fib { fib_r($_[0], 0, 1); } # pass initial values of a and b
print fib(10), "\n";
Other answers have already mentioned the lack of taking an argument correctly to the fibonacci function, and that you can't interpolate a function call in the print string like that. Lately my favourite way to interpolate function calls into print strings is to use the ${\ ... } notation for embedding arbitrary expressions into strings:
print "f($number) = ${\ fibonacci($number) }\n";
Other techniques include separate arguments:
print "f($number) = ", fibonacci($number), "\n";
or a helper variable:
my $result = fibonacci($number);
print "f($number) = $result\n";
or even printf:
printf "f(%d) = %d\n", $number, fibonacci($number);
Of all these techniques I tend to prefer either of the first two, because they lead to putting the expressions "in-line" with the rest of the text string, whereas in the latter two they sit elsewhere, making it harder to see at a glance what gets printed where. Especially with printf's positional arguments, it can be easy to be "off-by-one" with a large number of arguments, and put everything in the wrong place.
You are printing in wrong way. you just need to handle the return value. Also the way you are using Number in the sub is also not seems relevant. I have updated the and its working fine.
Also the values that you wanted to print is depend on the start up of the series. whether you want to start from 0 or 1.
The series example start with 1 is 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, so if you put 10 you will get 55.
#!/usr/bin/perl -w
use warnings;
use strict;
print "Please enter value of N: ";
my $number = <STDIN>;
chomp($number);
my $result=fibonacci($number);
sub fibonacci
{
my $f =0;
if ($_[0] == 1 ) { # base case
$f = 1;
} elsif ( $_[0] == 2 ) {
$f = 1;
} else { # recursive step
$f= fibonacci( $_[0] - 1 ) + fibonacci( $_[0] - 2 );
}
return $f;
}
print "\nf($number) = $result\n";

Perl recursion techniques?

I need a bit of help with is this code. I know the sections that should be recursive, or at least I think I do but am not sure how to implement it. I am trying to implement a path finding program from an alignment matrix that will find multiple routes back to the zero value. For example if you excute my code and insert CGCA as the first sequence and CACGTAT as the second sequence, and 1, 0, and -1 as the match, mismatch and gap scores. The program gives off the path as HDHHDD and the aligment as
CACGTAT
CGC--A-.
However there are more possible paths and aligments then this, except I don't know how many. What I want to do is have a piece of my code loop back on itself and find other paths and alignments, using the same code as the first time around, until it runs out of possible alignments. The best way I found on the net to do this is recursion, except no one can explain how to do it. In this case there should be two more paths and aligmennts HDDDHHD and CACGTAT, and C--GCA-, and. HDDDDHH , CACGTAT AND --CGCA-. I just don't know how to code to perform this task.
# Implementation of Needleman and Wunsch Algorithm
my($seq1, $len1, $seq2, $len2, $data, #matrix, $i, $j, $x, $y, $val1, $val2);
my($val3, $pathrow, $pathcol, $seq1loc, $seq2loc, $gapscore, $matchscore, $mismatchscore);
#first obtain the data from the user.
print "Please enter the first sequence for comaprsion\n";
$seq1=<STDIN>;
chomp $seq1;
print "Please enter the second sequence for comparsion\n";
$seq2=<STDIN>;
chomp $seq2;
# adding extra characters so sequences align with matrix
# saves some calculations later on
$seq1 = " " . $seq1;
$seq2 = " " . $seq2;
$len1 = length($seq1);
$len2 = length($seq2);
print "Enter the match score: ";
$matchscore=<STDIN>;
chomp $matchscore;
print "Enter the mismatch score: ";
$mismatchscore=<STDIN>;
chomp $mismatchscore;
print "Enter the gap score: ";
$gapscore=<STDIN>;
chomp $gapscore;
# declare a two dimensional array and initialize to spaces
# array must contain one extra row and one extra column
#matrix = ();
for($i = 0; $i < $len1; $i++){
for($j = 0; $j < $len2; $j++){
$matrix[$i][$j] = ' ';
}
}
# initialize 1st row and 1st column of matrix
$matrix[0][0] = 0;
for ($i = 1; $i < $len1; $i ++){
$matrix[$i][0] = $matrix[$i-1][0] + $gapscore;
}
for ($i = 1; $i < $len2; $i ++){
$matrix[0][$i] = $matrix[0][$i-1] + $gapscore;
}
# STEP 1:
# Fill in rest of matrix using the following rules:
# determine three possible values for matrix[x][y]
# value 1 = add gap score to matrix[x][y-1]
# value 2 = add gap score to matrix[x-1][y]
# value 3 = add match score or mismatch score to
# matrix[x-1][y-1] depending on nucleotide
# match for position x of $seq1 and position y
# of seq2
# place the largest of the three values in matrix[x][y]
#
# Best alignment score appears in matrix[$len1][$len2].
for($x = 1; $x < $len1; $x++){
for($y = 1; $y < $len2; $y++){
$val1 = $matrix[$x][$y-1] + $gapscore;
$val2 = $matrix[$x-1][$y] + $gapscore;
if (substr($seq1, $x, 1) eq substr($seq2, $y, 1)){
$val3 = $matrix[$x-1][$y-1] + $matchscore;
}
else{
$val3 = $matrix[$x-1][$y-1] + $mismatchscore;
}
if (($val1 >= $val2) && ($val1 >= $val3)){
$matrix[$x][$y] = $val1;
}
elsif (($val2 >= $val1) && ($val2 >= $val3)){
$matrix[$x][$y] = $val2;
}
else{
$matrix[$x][$y] = $val3;
}
}
}
# Display scoring matrix
print "MATRIX:\n";
for($x = 0; $x < $len1; $x++){
for($y = 0; $y < $len2; $y++){
print "$matrix[$x][$y] ";
}
print "\n";
}
print "\n";
# STEP 2:
# Begin at matrix[$len1][$len2] and find a path to
# matrix[0][0].
# Build string to hold path pattern by concatenating either
# 'H' (current cell produced by cell horizontally to left),
# 'D' (current cell produced by cell on diagonal),
# 'V' (current cell produced by cell vertically above)
# ***This is were I need help I need this code to be recursive, so I can find more then one path***
$pathrow = $len1-1;
$pathcol = $len2-1;
while (($pathrow != 0) || ($pathcol != 0)){
if ($pathrow == 0){
# must be from cell to left
$path = $path . 'H';
$pathcol = $pathcol - 1;
}
elsif ($pathcol == 0){
# must be from cell above
$path = $path . 'V';
$pathrow = $pathrow - 1;
}
# could be from any direction
elsif (($matrix[$pathrow][$pathcol-1] + $gapscore) == $matrix[$pathrow][$pathcol]){
# from left
$path = $path . 'H';
$pathcol = $pathcol - 1;
}
elsif (($matrix[$pathrow-1][$pathcol] + $gapscore) == $matrix[$pathrow][$pathcol]){
# from above
$path = $path . 'V';
$pathrow = $pathrow - 1;
}
else{
# must be from diagonal
$path = $path . 'D';
$pathrow = $pathrow - 1;
$pathcol = $pathcol - 1;
}
}
print "Path is $path\n";
# STEP 3:
# Determine alignment pattern by reading path string
# created in step 2.
# Create two string variables ($alignseq1 and $alignseq2) to hold
# the sequences for alignment.
# Reading backwards from $seq1, $seq2 and path string,
# if string character is 'D', then
# concatenate to front of $alignseq1, last char in $seq1
# and to the front of $alignseq2, last char in $seq2
# if string character is 'V', then
# concatenate to front of $alignseq1, last char in $seq1
# and to the front of $alignseq2, the gap char
# if string character is 'H', then
# concatenate to front of $alignseq1 the gap char
# and to the front of $alignseq2, last char in $seq2
# Continue process until path string has been traversed.
# Result appears in string $alignseq1 and $seq2
***#I need this code to be recursive as well.***
$seq1loc = $len1-1;
$seq2loc = $len2-1;
$pathloc = 0;
print length($path);
while ($pathloc < length($path)){
if (substr($path, $pathloc, 1) eq 'D'){
$alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;
$alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;
$seq1loc--;
$seq2loc--;
}
elsif (substr($path, $pathloc, 1) eq 'V'){
$alignseq1 = substr($seq1, $seq1loc, 1) . $alignseq1;
$alignseq2 = '-' . $alignseq2;
$seq1loc--;
}
else{ # must be an H
$alignseq1 = '-' . $alignseq1;
$alignseq2 = substr($seq2, $seq2loc, 1) . $alignseq2;
$seq2loc--;
}
$pathloc++;
}
print "\nAligned Sequences:\n";
print "$alignseq2 \n";
print "$alignseq1 \n";
# statement may be needed to hold output screen
print "Press any key to exit program";
$x = <STDIN>;
If anyone is wondering this is a needleman-wunsch algorithm. Any help here would be greatly apperciated.
I can't provide an answer, because I don't understand exactly what you are try to do, but I can offer some general advice.
Start organizing your code into discrete subroutines that perform narrowly defined tasks. In addition, the subroutines that implement your central algorithms should not be oriented toward receiving input from the keyboard and producing output to the screen; rather they should receive input as arguments and return their results. If there is a need for user input or screen output, those tasks should be in separate subroutines, not comingled with your primary algorithms.
A first (and partial) step down that path is to take you entire program, enclose it in a subroutine definition, and then call the subroutine with the required arguments. Instead of printing its key results, the subroutine should return them -- specifically, a reference to #matrix along with the values for $path, $alignseq1, $alignseq2.
sub NW_algo {
my ($seq1, $seq2, $matchscore, $mismatchscore, $gapscore) = #_;
# The rest of your code here, but with all print
# statements and <STDIN> inputs commented out.
return \#matrix, $path, $alignseq1, $alignseq2;
}
my(#return_values) = NW_algo('CGCA', 'CACGTAT', 1, 0, -1);
Print_matrix($return_values[0]);
sub Print_matrix {
for my $m ( #{$_[0]} ){
print join(' ', #$m), "\n";
}
}
At this point, you'll have an algorithm that can be invoked by other code, making it easier to test and debug your program going forward. For example, you could define various sets of input data and run NW_algo() on each set. Only then will it be possible to think about recursion or other techniques.
Since Needleman-Wunsch is a dynamic-programming algorithm, most of the work is already done by the time you compute your DP matrix. Once you have your DP matrix, you're supposed to backtrack through the matrix to find the optimal alignment. The problem is a bit like taxicab geometry except that you can move diagonally. Essentially, when you need to backtrack through the matrix, instead of choosing between going up, left, or diagonally, you do all three by making three recursive calls, and each of those call themselves for each of up, left, or diagonally, until you reach your starting point. The path traced by each strand of recursion will draw out each alignment.
EDIT: So basically you need to put Step 2 in a subprocedure (that takes position and the path traced so far), so it can call itself over and over. Of course, after you define the procedure you need to make one call to it to actually start the tracing process:
sub tracePaths {
$x = shift;
$y = shift;
$pathSoFar = shift; # assuming you're storing your path as a string
#
# ... do some work ...
#
tracePaths($x - 1, $y, $pathSoFar . something);
tracePaths($x, $y - 1, $pathSoFar . somethingelse);
tracePaths($x - 1, $y - 1, $pathSoFar . somethingelselse);
#
#
#
if(reached the end) return $pathSoFar;
}
# launch the recursion
tracePaths(beginningx, beginningy, "");
This doesn't speak specifically to your problem, but you should maybe check out the book Higher Order Perl. It goes over how to use a lot of higher-level techniques (such as recursion).