Alignment out of place when done through cgi-perl - perl

I've been trying to implement global alignment algorithm using cgi bioperl. The code works fine and the alignment between the two sequences is perfect using a command prompt. But using cgi the alignment of the two sequences is not perfect. It kinda moves out of place. Here's my code for alignment:
#firstarray = split //, $align1;
#secondarray = split //, $align2;
$sizeoffirst = $#firstarray + 1;
$sizeofsecond = $#secondarray + 1;
print "$sizeoffirst\n"."<br/>";
print "$sizeofsecond\n"."<br/>";
$k = 0;
while ($k <= $sizeoffirst)
{
$count = 1;
$l = $k;
while ($count <= 30)
{
print $firstarray[$l];
$count++;
$l++
}
print "\n"."<br/>";
$count = 1;
$m = $k;
while ($count <= 30)
{
print $secondarray[$m];
$count++;
$m++;
}
print "\n"."<br/>";
print "\n"."<br/>";
$count = 1;
$k = $k + 30;
}
The "dash"(in order to denote a gap) is quite smaller as compare to the letters. Hence the alignment moves out of the place. What can I do?

you should use <pre> so that your formatting is maintained on a web page.
– Barmar
Or a <table>. It's the way to display tabulated data. Also, ALWAYS use strict; use warnings;.
– Toto

Related

Scoping in Perl

As a biology student, I'm trying to extend my programming knowledge and I ran into a problem with Perl.
I'm trying to create a program that generates random DNA strings and performs analysis work on the generated data.
In the first part of the program, I am able to print out the strings stored in the array, but the second part I cannot retrieve all but one of the elements of the array.
Could this be part of the scoping rules of Perl?
#!usr/bin/perl
# generate a random DNA strings and print it to file specified by the user.
$largearray[0] = 0;
print "How many nucleotides for the string?\n";
$n = <>;
$mylong = $n;
print "how many strings?\n";
$numstrings = <>;
# #largearray =();
$j = 0;
while ( $j < $numstrings ) {
$numstring = ''; # start with the empty string;
$dnastring = '';
$i = 0;
while ( $i < $n ) {
$numstring = int( rand( 4 ) ) . $numstring; # generate a new random integer
# between 0 and 3, and concatenate
# it with the existing $numstring,
# assigning the result to $numstring.
$i++; # increase the value of $i by one.
}
$dnastring = $numstring;
$dnastring =~ tr/0123/actg/; # translate the numbers to DNA characters.
#print $dnastring;
#print "\n";
$largearray[j] = $dnastring; #append generated string to end of array
#print $largearray[j];
#print $j;
#IN HERE THERE ARE GOOD ARRAY VALUES
#print "\n";
$j++;
}
# ii will be used to continuously take the next couple of strings from largearray
# for LCS matching.
$mytotal = 0;
$ii = 0;
while ( $ii < $numstrings ) {
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
print "\n";
$ii++;
#string1 = split( //, $line );
$line = $largearray[ii];
#print $largearray[ii];
#print "\n";
$ii++;
chomp $line;
#string2 = split( //, $line );
$n = #string1; #assigning a list to a scalar just assigns the
#number of elements in the list to the scalar.
$m = #string2;
$v = 1;
$Cm = 0;
$Im = 0;
$V[0][0] = 0; # Assign the 0,0 entry of the V matrix
for ( $i = 1; $i <= $n; $i++ ) { # Assign the column 0 values and print
# String 1 See section 5.2 of Johnson
# for loops
$V[$i][0] = -$Im * $i;
}
for ( $j = 1; $j <= $m; $j++ ) { # Assign the row 0 values and print String 2
$V[0][$j] = -$Im * $j;
}
for ( $i = 1; $i <= $n; $i++ ) { # follow the recurrences to fill in the V matrix.
for ( $j = 1; $j <= $m; $j++ ) {
# print OUT "$string1[$i-1], $string2[$j-1]\n"; # This is here for debugging purposes.
if ( $string1[ $i - 1 ] eq $string2[ $j - 1 ] ) {
$t = 1 * $v;
}
else {
$t = -1 * $Cm;
}
$max = $V[ $i - 1 ][ $j - 1 ] + $t;
# print OUT "For $i, $j, t is $t \n"; # Another debugging line.
if ( $max < $V[$i][ $j - 1 ] - 1 * $Im ) {
$max = $V[$i][ $j - 1 ] - 1 * $Im;
}
if ( $V[ $i - 1 ][$j] - 1 * $Im > $max ) {
$max = $V[ $i - 1 ][$j] - 1 * $Im;
}
$V[$i][$j] = $max;
}
} #outer for loop
print $V[$n][$m];
$mytotal += $V[$n][$m]; # append current result to the grand total
print "\n";
} # end while loop
print "the average LCS value for length ", $mylong, " strings is: ";
print $mytotal/ $numstrings;
This isn't a scoping issue. You have declared none of your variables, which has the effect of implicitly making them all global and accessible everywhere in your code
I reformatted your Perl program so that I could read it, and then added this to the top of your program
use strict;
use warnings 'all';
which are essential in every Perl program you write
Then I added
no strict 'vars';
which is a very bad idea, and lets you get away without declaring any variables
The result is this
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 60.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 61.
Argument "ii" isn't numeric in array element at E:\Perl\source\dna.pl line 67.
Argument "j" isn't numeric in array element at E:\Perl\source\dna.pl line 42.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 60.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 61.
Bareword "ii" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 67.
Bareword "j" not allowed while "strict subs" in use at E:\Perl\source\dna.pl line 42.
Execution of E:\Perl\source\dna.pl aborted due to compilation errors.
Line 42 (of my reformatted version) is
$largearray[j] = $dnastring
and lines 60, 61 and 67 are
$line = $largearray[ii];
print $largearray[ii]; #CANNOT RETRIEVE ARRAY VALUES
and
$line = $largearray[ii];
You are using j and ii as array indexes. Those are Perl subroutine calls, not variables. Adding use strict would have stopped this from compiling unless you had also declared sub ii and sub j
You might get away with it if you just change j and ii to $j and $ii, but you are certain to get into further problems
Please make the same changes to your own code, and declare every variable that you need using my as close as possible to the first place they are used
You should also improve your variable naming. Things like #largearray are pointless: the # says that it's an array, and whether it's large or not is relative, and of little use in understanding your code. If you have no better description of its purpose then #table or #data are probably a little better
Likewise, please avoid capital letters and most single-letter names. #V, $Cm and $Im are meaningless, and you would need fewer comments if those names were better
You certainly wouldn't need comments like # end while loop and # outer for loop if you had indented your blocks properly and kept them short enough so that both the beginning and the end can be seen on the screen at the same time, and the fewer comments you can get away with the better, because they badly clutter the code structure
Finally, it's worth noting that the C-style for loop is rarely the best choice in Perl. Your
for ( $i = 1; $i <= $n; $i++ ) { ... }
is much clearer as
for my $i ( 1 .. $n ) { ... }
and declaring the control variable at that point makes it unnecessary to invent new names like $ii for each new loop
I think you have a typo in your code:
ii => must be $ii
don't forget to put this at the beginning of your code:
use strict;
use warnings;
in order to avoid this (and others) kind of errors

Using big numbers in Perl

I have a scenario where I take 2 very big binary strings (having 100 characters) and I need to add them.
The issue is that I am getting the answer in the form 2.000xxxxxxxxxxe+2, whereas I need the precise answer, as another 100 character long string.
chomp($str1=<STDIN>);
chomp($str2=<STDIN>);
print "Str 1 is $str1\n";
print "Str 2 is $str2\n";
$t = $str1 + $str2;
print "Sum is $t\n";
Sample Input
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Sample Output
Str1 is
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Str2 is
1001101111101011011100101100100110111011111011000100111100111110111101011011011100111001100011111010
Sum is
2.0022022220202e+099
As already suggested, you can use Math::BigInt core module,
use Math::BigInt;
# chomp($str1=<STDIN>);
# chomp($str2=<STDIN>);
# print "Str 1 is $str1\n";
# print "Str 2 is $str2\n";
my $t = Math::BigInt->new("0b$str1") + Math::BigInt->new("0b$str2");
print $t->as_bin;
In order to perform arithmetic on your strings, Perl converts them to floating-point numbers, which are inherently imprecise. If you want to avoid that, use Math::BigInt as already suggested ... or roll your own.
######## WARNING/GUARANTEE: This is practically certain to be
# slower, buggier, less portable, and less secure than Math::BigInt.
# In fact, I planted a security hole just to prove a point. Enjoy.
use strict;
use warnings;
sub addition {
my ($int1, $int2) = #_;
my #int1 = reverse split //, $int1;
my #int2 = reverse split //, $int2;
my $len = scalar(#int1>#int2 ? #int1 : #int2);
my #result;
my $carry = 0;
for (my $i=0; $i < $len; ++$i)
{
$int1[$i] |= 0;
$int2[$i] |= 0;
my $sum = $carry + $int1[$i] + $int2[$i];
if ($sum >= 10)
{
$carry = int($sum / 10);
$sum %= 10;
}
push #result, $sum;
}
push #result, $carry if $carry;
return join ('', reverse #result);
}

Perl. How can I make output disappear after several seconds?

I'm prompting a user for a correct answer for example:
/> 13 + 7 ?
is it any way of making this output disappear after 2 seconds for example ?
..thanks' for any suggestions
You're asking for a few things combined, I think:
1) how do you erase a line
2) how do you wait for input for a while and then give up on waiting (ie, a timer)
The following code will do what you want (there are other ways of accomplishing both of the above, but the below shows one way for each of the above tasks):
use strict; use warnings;
use IO::Select;
my $stdin = IO::Select->new();
$stdin->add(\*STDIN);
# always flush
$| = 1;
my $question = "/> 7 + 3 ? ";
print $question;
if ($stdin->can_read(2)) {
print "you entered: " . <STDIN>;
} else {
print "\010" x length($question);
print " " x length($question);
print "too late\n";
}
Use select on STDIN to see whether there is any input within 2 seconds. If not, overwrite the output with a carriage return (\r) or multiple backspaces (\b).
Proof of concept:
$| = 1; # needed because print calls don't always use a newline
$i = int(rand() * 10);
$j = int(rand() * 10);
$k = $i + $j;
print "What is $i + $j ? ";
$rin = '';
vec($rin, fileno(STDIN), 1) = 1;
$n = select $rout=$rin, undef, undef, 2.0;
if ($n) {
$answer = <STDIN>;
if ($answer == $k) {
print "You are right.\n";
} else {
print "You are wrong. $i + $j is $k\n";
}
} else {
print "\b \b" x 15;
print "\n\n";
print "Time's up!\n";
sleep 1;
}
When you are ready for a more advanced solution, you could probably check out Term::ReadKey (so you don't have to hit Enter after you type in your answer) or something like Curses to exercise more control over writing to arbitrary spots on your terminal.

Perl Recursion and Functions

Having heard about Perl for year I decided to give it a few hours of my time to see how much I could pick up. I got through the basics fine and then got to loops. As a test I wanted to see if I could build a script to recurse through all alphanumerical values of up to 4 characters. I had written a PHP code that did the same thing some time ago so I took the same concept and used it. However when I run the script it puts "a" as the first 3 values and then only loops through the last digit. Anyone see what I am doing wrong?
#!/usr/local/bin/perl
$chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
#charset = split(//, $chars);
$charset_length = scalar(#charset);
sub recurse
{
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
This is what I get when I run it:
aaaa
aaab
aaac
aaad
aaae
aaaf
aaag
aaah
aaai
aaaj
aaak
aaal
aaam
aaan
aaao
aaap
aaaq
aaar
aaas
aaat
aaau
aaav
aaaw
aaax
aaay
aaaz
aaaA
aaaB
aaaC
aaaD
aaaE
aaaF
aaaG
aaaH
aaaI
aaaJ
aaaK
aaaL
aaaM
aaaN
aaaO
aaaP
aaaQ
aaaR
aaaS
aaaT
aaaU
aaaV
aaaW
aaaX
aaaY
aaaZ
aaa0
aaa1
aaa2
aaa3
aaa4
aaa5
aaa6
aaa7
aaa8
aaa9
aaa9
aaa9
aaa9
You've been bitten by non strict scoping, this code does what it should (note the use strict at the top and the subsequent use of my to guarantee variable scoping).
#!/usr/bin/env perl
use strict;
use warnings;
my $chars = "abcdefghijklmnopqrstuvwxyz";
$chars .= "ABCDEFGHIJKLMNOPQRSTUVWXYZ";
$chars .= "0123456789";
my #charset = split(//, $chars);
my $charset_length = scalar(#charset);
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print "\n";
}
}
recurse(4, 0, '');
Already well answered, but a more idiomatic approach would be:
use strict;
use warnings;
sub recurse {
my ($width, $base_string, $charset) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
recurse(4, '', \#charset);
There's no need to pass position; it's implicit in the width of the base string passed in. The charset, on the other hand, should be passed in rather than having the subroutine use an external variable.
Alternatively, since the width and character set stay constant, generate a closure that references them:
use strict;
use warnings;
sub make_recurser {
my ($width, $charset) = #_;
my $recurser;
$recurser = sub {
my ($base_string) = #_;
if (length $base_string) {
print "$base_string\n";
}
if (length($base_string) < $width) {
$recurser->($base_string . $_) for #$charset;
}
}
}
my #charset = ('a'..'z', 'A'..'Z', '0'..'9');
my $recurser = make_recurser(4, \#charset);
$recurser->('');
Alternatively, just:
print "$_\n" for glob(('{' . join(',', 'a'..'z', 'A'..'Z', '0'..'9') . '}') x 4);
It has to do with the scope of the variables, you're still changing the same vars when you're calling the recursion. The keyword 'my' declares the variables local to the subroutine.
(http://perl.plover.com/FAQs/Namespaces.html)
I always use perl with 'use strict;' declared, forcing me to decide on the scope of the variables.
sub recurse {
my ($width, $position, $base_string) = #_;
for (my $i = 0; $i < $charset_length; ++$i) {
my $base = $base_string . $charset[$i];
if ($position < $width - 1) {
my $pos = $position + 1;
recurse($width, $pos, $base);
}
print $base;
print " ";
}
}
You seem to be running into some scoping issues. Perl is very flexible, so it is taking a guess at what you want because you haven't told it what you want. One of the first things you'll learn is to add use strict; as for your first statement after the shebang. It will point out the variables that are not being explicitly defined, as well as any variables that are accessed before being created (helps with misspelled variables, etc).
If you make your code look like this, you'll see why you are getting your errors:
sub recurse {
($width, $position, $base_string) = #_;
for ($i = 0; $i < $charset_length; ++$i) {
$base = $base_string . $charset[$i];
if ($position < $width - 1) {
$pos = $position + 1;
recurse($width, $pos, $base);
}
# print "$base\n";
}
print "$position\n";
}
This should output:
3
3
3
3
Because you are not scoping $position correctly with my, you aren't getting a new variable each recurse, you are re-using the same one. Toss a use strict; in there, and fix the errors you get, and the code should be good.
I realize that you're just tinkering with recursion. But as long as you're having fun comparing implementations between two languages you may as well also see how the CPAN can extend your tool set.
If you don't care about the order, you can generate all 13,388,280 permutations of ( 'a'..'z', 'A..'Z', '0'..'9' ) taken four at a time with the CPAN module, Algorithm::Permute
Here is an example of how that code may look.
use strict;
use warnings;
use Algorithm::Permute;
my $p = Algorithm::Permute->new(
[ 'a' .. 'z', 'A' .. 'Z', '0' .. '9' ], # Set of...
4 # <---- at a time.
);
while ( my #res = $p->next ) {
print #res, "\n";
}
The new() method accepts an array ref that enumerates the character set or list of what to permute. Its second argument is how many at a time to include in the permutation. So you're essentially taking 62 items 4 at a time. Then use the next() method to iterate through the permutations. The rest is just window dressing.
The same thing could be reduced to the following Perl one-liner:
perl -MAlgorithm::Permute -e '$p=Algorithm::Permute->new(["a".."z","A".."Z",0..9],4);print #r, "\n" while #r=$p->next;'
There is also a section on permutation, along with additional examples in perlfaq4. It includes several examples and lists some additional modules that handle the details for you. One of Perl's strengths is the size and completeness of the Comprehensive Perl Archive Network (the CPAN).

How can I extract/parse tabular data from a text file in Perl?

I am looking for something like HTML::TableExtract, just not for HTML input, but for plain text input that contains "tables" formatted with indentation and spacing.
Data could look like this:
Here is some header text.
Column One Column Two Column Three
a b
a b c
Some more text
Another Table Another Column
abdbdbdb aaaa
Not aware of any packaged solution, but something not very flexible is fairly simple to do assuming you can do two passes over the file: (the following is partially Perlish pseudocode example)
Assumption: data may contain spaces and is NOT quoted ala CSV if there's a space - if this is not the case, just use Text::CSV(_XS).
Assumption: no tabs used for formatting.
The logic defines a "column separator" to be any consecutive set of vertical rows populated 100% with spaces.
If by accident every row has a space which is part of the data at offset M characters, the logic will consider offset M to be a column separator, since it can't know any better. The ONLY way it can know better is if you require column separation to be at least X spaces where X>1 - see the second code fragment for that.
Sample code:
my $INFER_FROM_N_LINES = 10; # Infer columns from this # of lines
# 0 means from entire file
my $lines_scanned = 0;
my #non_spaces=[];
# First pass - find which character columns in the file have all spaces and which don't
my $fh = open(...) or die;
while (<$fh>) {
last if $INFER_FROM_N_LINES && $lines_scanned++ == $INFER_FROM_N_LINES;
chomp;
my $line = $_;
my #chars = split(//, $line);
for (my $i = 0; $i < #chars; $i++) { # Probably can be done prettier via map?
$non_spaces[$i] = 1 if $chars[$i] ne " ";
}
}
close $fh or die;
# Find columns, defined as consecutive "non-spaces" slices.
my #starts, #ends; # Index at which columns start and end
my $state = " "; # Not inside a column
for (my $i = 0; $i < #non_spaces; $i++) {
next if $state eq " " && !$non_spaces[$i];
next if $state eq "c" && $non_spaces[$i];
if ($state eq " ") { # && $non_spaces[$i] of course => start column
$state = "c";
push #starts, $i;
} else { # meaning $state eq "c" && !$non_spaces[$i] => end column
$state = " ";
push #ends, $i-1;
}
}
if ($state eq "c") { # Last char is NOT a space - produce the last column end
push #ends, $#non_spaces;
}
# Now split lines
my $fh = open(...) or die;
my #rows = ();
while (<$fh>) {
my #columns = ();
push #rows, \#columns;
chomp;
my $line = $_;
for (my $col_num = 0; $col_num < #starts; $col_num++) {
$columns[$col_num] = substr($_, $starts[$col_num], $ends[$col_num]-$starts[$col_num]+1);
}
}
close $fh or die;
Now, if you require column separation to be at least X spaces where X>1, it's also doable but the parser of column locations needs to be a bit more complex :
# Find columns, defined as consecutive "non-spaces" slices separated by at least 3 spaces.
my $min_col_separator_is_X_spaces = 3;
my #starts, #ends; # Index at which columns start and end
my $state = "S"; # inside a separator
NEXT_CHAR: for (my $i = 0; $i < #non_spaces; $i++) {
if ($state eq "S") { # done with last column, inside a separator
if ($non_spaces[$i]) { # start a new column
$state = "c";
push #starts, $i;
}
next;
}
if ($state eq "c") { # Processing a column
if (!$non_spaces[$i]) { # First space after non-space
# Could be beginning of separator? check next X chars!
for (my $j = $i+1; $j < #non_spaces
|| $j < $i+$min_col_separator_is_X_spaces; $j++) {
if ($non_spaces[$j]) {
$i = $j++; # No need to re-scan again
next NEXT_CHAR; # OUTER loop
}
# If we reach here, next X chars are spaces! Column ended!
push #ends, $i-1;
$state = "S";
$i = $i + $min_col_separator_is_X_spaces;
}
}
next;
}
}
Here's a very quick solution, commented with an overview. (My apologies for the length.) Basically, if a "word" appears after the start of column header n, then it ends up in column n, unless most of its body trails into column n + 1, in which case it ends up there instead. Tidying this up, extending it to support multiple different tables, etc. are left as an exercise. You could also use something other than the left offset of the column header as the boundary mark, such as the centre, or some value determined by the column number.
#!/usr/bin/perl
use warnings;
use strict;
# Just plug your headers in here...
my #headers = ('Column One', 'Column Two', 'Column Three');
# ...and get your results as an array of arrays of strings.
my #result = ();
my $all_headers = '(' . (join ').*(', #headers) . ')';
my $found = 0;
my #header_positions;
my $line = '';
my $row = 0;
push #result, [] for (1 .. #headers);
# Get lines from file until a line matching the headers is found.
while (defined($line = <DATA>)) {
# Get the positions of each header within that line.
if ($line =~ /$all_headers/) {
#header_positions = #-[1 .. #headers];
$found = 1;
last;
}
}
$found or die "Table not found! :<\n";
# For each subsequent nonblank line:
while (defined($line = <DATA>)) {
last if $line =~ /^$/;
push #{$_}, "" for (#result);
++$row;
# For each word in line:
while ($line =~ /(\S+)/g) {
my $word = $1;
my $position = $-[1];
my $length = $+[1] - $position;
my $column = -1;
# Get column in which word starts.
while ($column < $#headers &&
$position >= $header_positions[$column + 1]) {
++$column;
}
# If word is not fully within that column,
# and more of it is in the next one, put it in the next one.
if (!($column == $#headers ||
$position + $length < $header_positions[$column + 1]) &&
$header_positions[$column + 1] - $position <
$position + $length - $header_positions[$column + 1]) {
my $element = \$result[$column + 1]->[$row];
$$element .= " $word";
# Otherwise, put it in the one it started in.
} else {
my $element = \$result[$column]->[$row];
$$element .= " $word";
}
}
}
# Output! Eight-column tabs work best for this demonstration. :P
foreach my $i (0 .. $#headers) {
print $headers[$i] . ": ";
foreach my $c (#{$result[$i]}) {
print "$c\t";
}
print "\n";
}
__DATA__
This line ought to be ignored.
Column One Column Two Column Three
These lines are part of the tabular data to be processed.
The data are split based on how much words overlap columns.
This line ought to be ignored also.
Sample output:
Column One: These lines are The data are split
Column Two: part of the tabular based on how
Column Three: data to be processed. much words overlap columns.