PERL Script for discerning between cavity and void space in PDB(Protein Database) file - perl

The problem with the following code is only in one function of the code. The problem function is with a comment head and close. This is my first post to StackOverflow so bear with me. The following script has some modules and other functions that I know work by testing them with the problem function commented out but I just cannot seem to get that one function to work. When ran, the script runs until the enviroment kills the execution.
Basically what this program does is takes a PDB file, copies everything out of the PDB file and creates a new one and pastes all of the original input file content into the new file and appends the cavities(coordinates of center of the cavity and the specified probe radius) that the program is supposed to find.
The problem function within the code is supposed to distinguish between a void space within a bound box of the structure and a cavity. Cavities are considered to be a closed space somewhere within the structure. A void space is any space or coordinate within the bounding box of max and min coorindates where there isn't an atom.The cavity must be large enough to fit into a specified probe radius. There is also a specified resolution when searching through the 3D hashtable of coordinates.
Can anyone tell me why my code isn't working. Anything immediate. I have tested and tested and cannot seem to find the error.
Thank you.
#!/usr/bin/perl
# Example 11-6 Extract atomic coordinates from PDB file
use strict;
use warnings;
use BeginPerlBioinfo; # see Chapter 6 about this module
#open file for printing
open(FH,">results.pdb");
open(PDB,"oneAtom.pdb");
while(<PDB>) { print FH $_; }
close(PDB);
# Read in PDB file
my #file = get_file_data('oneAtom.pdb');
# Parse the record types of the PDB file
my %recordtypes = parsePDBrecordtypes(#file);
# Extract the atoms of all chains in the protein
my %atoms = parseATOM ( $recordtypes{'ATOM'} );
#define some variables and get the atom indices stored in atom_numbers array
my #atom_numbers = sort {$a <=> $b} keys %atoms;
my $resolution = 4.;
my $lo = 1000;
my $hi = -1000;
my $p_rad = 1;
my %pass;
#set the grid boundaries
foreach my $l ( #atom_numbers ) {
for my $i (0..2) {
if ( $atoms{$l}[$i] < $lo ) { $lo = $atoms{$l}[$i]; }
if ( $atoms{$l}[$i] > $hi ) { $hi = $atoms{$l}[$i]; }
}
}
$lo = $lo - 2* $resolution;
$hi = $hi + 2* $resolution;
#compute min distance to the pdb structure from each grid point
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
my $min_dist = 1000000;
foreach my $l ( #atom_numbers ) {
my $distance = sqrt((($atoms{$l}[0]-($i))*($atoms{$l}[0]-($i))) + (($atoms{$l}[1]-($j))*($atoms{$l}[1]-($j))) + (($atoms{$l}[2]-($k))*($atoms{$l}[2]-($k))));
$distance = $distance - ( $p_rad + $atoms{$l}[3] );
if ( $distance < $min_dist ) {
$min_dist = $distance;
}
}
$pass{$i}{$j}{$k} = $min_dist;
if ( $pass{$i}{$j}{$k} > 0 ) {
$pass{$i}{$j}{$k} = 1;
} else { $pass{$i}{$j}{$k} = 0;
}
}
}
}
#define a starting point on the outside of the grid and place first on list of points
#my #point = ();
my $num_cavities = 0;
#define some offsets used to compute neighbors
my %offset = (
1 => [-1*$resolution,0,0],
2 => [1*$resolution,0,0],
3 => [0,-1*$resolution,0],
4 => [0,1*$resolution,0],
5 => [0,0,-1*$resolution],
6 => [0,0,1*$resolution],
);
##########################################################
#function below with problem
##########################################################
my #point = ();
push #point,[$hi,$hi,$hi];
=pod
#do the following while there are points on the list
while ( #point ) {
foreach my $vector ( keys %offset ) { #for each offset vector
my #neighbor = (($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])); #compute neighbor point
if ( exists $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} ) { #see if neighbor is in the grid
if ( $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} == 1 ) { #if it is see if its further than the probe radius
push #point,[($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])]; #if it is push it onto the list of points
}
}
}
$pass{$point[0][0]}{$point[0][1]}{$point[0][2]} = 0; #eliminate the point just tested from the pass array
shift #point; #move to the next point in the list
}
=cut
##############################################################
# end of problem function
##############################################################
my $grid_ind = $atom_numbers[$#atom_numbers];
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
if ( $pass{$i}{$j}{$k} == 1 ) {
$grid_ind = $grid_ind + 1;
my $n = sprintf("%5d",$grid_ind);
my $x = sprintf("%7.3f",$i);
my $y = sprintf("%7.3f",$j);
my $z = sprintf("%7.3f",$k);
my $w = sprintf("%6.3f",1);
my $p = sprintf("%6.3f",$p_rad);
print FH "ATOM $n MC CAV $n $x $y $z $w $p \n";
}
}
}
}
close(FH);
exit;
#do the following while there are points on the list
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
if ( $pass{$i}{$j}{$k} == 1 ) {
push #point,[$i,$j,$k];
$num_cavities++;
while ( #point ) {
foreach my $vector ( keys %offset ) { #for each offset vector
my #neighbor = (($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])); #compute neighbor point
if ( exists $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} ) { #see if neighbor is in the grid
if ( $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} == 1 ) { #if it is see if its further than the probe radius
push #point,[($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])]; #if it is push it onto the list of points
}
}
}
$pass{$point[0][0]}{$point[0][1]}{$point[0][2]} = 0; #eliminate the point just tested from the pass array
shift #point; #move to the next point in the list
}
}
}
}
}
#print the results
print "\nthe structure has " . $num_cavities . " cavities.\n\n";
#print the point that are left over (these correspond to the cavities)
#for ( my $i = -10 ; $i <= 10 ; $i = $i + $resolution ) {
# for ( my $j = -10 ; $j <= 10 ; $j = $j + $resolution ) {
# for ( my $k = -10 ; $k <= 10 ; $k = $k + $resolution ) {
# print $i . "\t" . $j . "\t" . $k . "\t" . $pass{$i}{$j}{$k} . "\n";
# }
# }
#}
###################################################
# function
###################################################
sub parseATOM {
my($atomrecord) = #_;
use strict;
use warnings;
my %results = ( );
# Turn the scalar into an array of ATOM lines
my(#atomrecord) = split(/\n/, $atomrecord);
foreach my $record (#atomrecord) {
my $number = substr($record, 6, 5); # columns 7-11
my $x = substr($record, 30, 8); # columns 31-38
my $y = substr($record, 38, 8); # columns 39-46
my $z = substr($record, 46, 8); # columns 47-54
my $r = substr($record, 60, 6); # columns 47-54
#my $element = substr($record, 76, 2); # columns 77-78
# $number and $element may have leading spaces: strip them
$number =~ s/\s*//g;
#$element =~ s/\s*//g;
$x =~ s/\s*//g;
$y =~ s/\s*//g;
$z =~ s/\s*//g;
$r =~ s/\s*//g;
# Store information in hash
#$results{$number} = [$x,$y,$z,$element];
$results{$number} = [$x,$y,$z,$r];
}
# Return the hash
return %results;
}

Here's one thing that is almost certainly slowing things down:
$x =~ s/\s*//g;
$y =~ s/\s*//g;
$z =~ s/\s*//g;
$r =~ s/\s*//g;
It is possible for \s* to match an empty string, so you are replacing empty strings with empty strings, for each empty string in the target string.
Change to:
$x =~ s/\s+//g;
$y =~ s/\s+//g;
$z =~ s/\s+//g;
$r =~ s/\s+//g;

You have the following definitions:
my $lo = 1000;
my $hi = -1000;
So when you get to your first for loop, you will set $i to 1000, and then fail the check to see if it is less than -1000.

Related

How to make this Perl program print in descending order?

This code works but it prints in ascending order. Do I need to change my whole formula?
print "Enter an integer \n";
my $root = <STDIN>;
my #nums = (100..200);
my $i = 0;
# code in while loop executes as long as condition is true
while ( $i < $#nums )
{
print "$nums[$i]\n",if($nums[$i] % $root == 0); $i++;
}
Just set $i to $#nums instead of 0 and decrement it -- instead of incrementing. You'll need to change the loop condition to $i >= 0 (it should be $i <= $#nums in your code, otherwise it skips 200 when 10 was entered).
#!/usr/bin/perl
use warnings;
use strict;
print "Enter an integer \n";
my $root = <>;
my #nums = (100 .. 200);
my $i = $#nums;
while ( $i >= 0 ) {
print "$nums[$i]\n" if $nums[$i] % $root == 0;
--$i;
}
There are more than a few ways to do it ... not all equally good:
#!/usr/bin/env perl
use strict;
use warnings;
run(#ARGV);
sub run {
my $root = $_[0] // get_root();
my #nums = (100 .. 200);
my #functions = (
sub {
my ($root, $nums) = #_;
my $i = #$nums;
while ($i--) {
print "$nums->[$i]\n" unless $nums->[$i] % $root;
}
return;
},
sub {
my ($root, $nums) = #_;
for my $n ( reverse #$nums ) {
print "$n\n" unless $n % $root;
}
return;
},
sub {
my ($root, $nums) = #_;
my $i;
while ($i++ < #$nums) {
print "$nums->[#$nums - $i]\n" unless $nums->[#$nums - $i] % $root;
}
return;
},
sub {
my ($root, $nums) = #_;
my #multiples = reverse grep !($_ % $root), #$nums;
print "$_\n" for #multiples;
return;
},
);
for my $i ( 0 ... $#functions ) {
print "=== Function $i ===\n";
$functions[$i]->($root, \#nums);
}
}
sub get_root {
return scalar <STDIN>;
}
print "Enter an integer \n";
my $root = <STDIN>;
my #nums = (100..200);
#nums = reverse #nums; #Just reverse the arrays
my $i = 0;
# code in while loop executes as long as condition is true
while ( $i < $#nums )
{
print "$nums[$i]\n",if($nums[$i] % $root == 0); $i++;
}
May be it will helps you.
for (my $i = $#nums; $i >= 0; --$i) { ... }
for (my $i = #nums; $i--; ) { ... }
for my $i (reverse 0 .. $#nums) { ... }
for (1 .. #nums) { my $i = -$_; ... } # Or: my $i = #nums-$_;
for my $num (reverse #nums) { ... }

Transposing the matrix in perl

I am trying to perform a transpose on a data contained in a file. The data is as follows:
1 2 3 4 5
2 3 4 5 6
4 5 6 7 9
4 3 7 6 9
I am getting the result as follows which is incorrect. I am not getting the error in the code due to which the last column is not transposed properly. Any solution...
Code:
#!/usr/bin/perl
use strict;
use warnings;
my #dependent; # matrix of dependent variable
# Reading the data from text file to the matrix
open( DATA, "<example.txt" ) or die "Couldn't open file , $!"; #depenedent
# Storing data into the array in matrix form
while ( my $linedata = <DATA> ) {
push #dependent, [ split '\t', $linedata ];
}
my $m = #dependent;
#print "$m\n";
my $n = #{ $dependent[1] };
#print $n;
#print "Matrix of dependent variables Y \n";
for ( my $i = 0; $i < $m; $i++ ) {
for ( my $j = 0; $j < $n; $j++ ) {
#print $dependent[$i][$j]," ";
}
#print "\n";
}
my #transpose;
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
$transpose[$i][$j] = $dependent[$j][$i];
}
}
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
print $transpose[$i][$j], " ";
}
print "\n";
}
chomp your data when you read it, before you split it; your strange output is caused by the last element of each row of the input still having a newline attached.
Just as a side note, DATA isn't a very good name to pick for a filehandle; perl already defines a special builtin filehandle named DATA for reading data that's embedded in a script or a module, so using that name for yourself can lead to confusion :)

How to find range

I have two sets of ranges. Each range is a pair of integers (start and end) representing some sub-range of a single larger range.
I need to determine which ranges from set A overlap with which ranges from set B.
I think your input got truncated, because I can't see any way to get the last rows of your expected output.
But, for the portion that's there, I think you want a script like this:
my %cover;
foreach my $line ( <STDIN> )
{
chomp $line;
my ( $tag, $lo, $hi ) = split /\s+/, $line;
map { $cover{$_}++ } ($lo .. $hi);
}
my $beg = 0;
my $end = 0;
my $cnt = 0;
foreach my $val ( sort { $a <=> $b } keys %cover )
{
if ($cover{$val} != $cnt || $val > $end + 1)
{
if ($cnt > 0)
{
print "chr1\t$beg\t$end\t$cnt\n";
}
$cnt = $cover{$val};
$beg = $val;
$end = $val;
} else
{
$end = $val;
}
}
if ($cnt > 0)
{
print "chr1\t$beg\t$end\t$cnt\n";
}
You didn't tell us, though, what to do with chr1 or how to related it between the input and output (are there other values that could appear there, for example?) so I just hardcoded that in the output. You'll have to change that part appropriately.
Also, my script outputs slightly different ranges than your "expected output," specifically where two ranges abut. My script, for example, outputs
chr1 556 579 1
chr1 580 592 2
but your expected output gives 580 instead of 579 for the first line. I'm not sure your expected output is correct. If you really did want 580 there (which doesn't make a lot of sense), then you could modify the script above to output $end+1 when $val == $end+1. That just seems weird though.
Here's that modified version of the code that gives the weird behavior when ranges abut:
my %cover;
foreach my $line ( <STDIN> )
{
chomp $line;
my ( $tag, $lo, $hi ) = split /\s+/, $line;
map { $cover{$_}++ } ($lo .. $hi);
}
my $beg = 0;
my $end = 0;
my $cnt = 0;
foreach my $val ( sort { $a <=> $b } keys %cover )
{
if ($cover{$val} != $cnt || $val > $end + 1)
{
## unusual value for '$end' when ranges abut.
$end = $val if ($val == $end + 1);
if ($cnt > 0)
{
print "chr1\t$beg\t$end\t$cnt\n";
}
$cnt = $cover{$val};
$beg = $val;
$end = $val;
} else
{
$end = $val;
}
}
if ($cnt > 0)
{
print "chr1\t$beg\t$end\t$cnt\n";
}

Is it somehow possible to catch with Term::TermKey the cursor position?

Is it possible to get hold of the cursor position with Term::TermKey in a similar way Term::ReadKey can do it:
#!/usr/bin/env perl
use warnings;
use 5.12.0;
use Term::ReadKey;
ReadMode 4;
system( 'clear' ) == 0 or die $?;
print "Hello world\n" x 4;
print "go to column 21 -> |";
print "\e[6n";
my ( $x, $y ) = getch();
say "Col: $x - Row: $y";
ReadMode 0;
sub getch {
my $c = ReadKey 0;
if ( $c eq "\e" ) {
my $c = ReadKey 0.10;
if ( $c eq '[' ) {
my $c = ReadKey 0;
if ( $c =~ /\A\d/ ) {
my $c1 = ReadKey 0;
if ( $c1 ne '~' ) {
my $y = 0 + $c;
while ( 1 ) {
last if $c1 eq ';';
$y = 10 * $y + $c1;
$c1 = ReadKey 0;
}
my $x = 0;
while ( 1 ) {
$c1 = ReadKey 0;
last if $c1 eq 'R';
$x = 10 * $x + $c1;
}
return $x, $y;
}
}
}
}
}
Not yet, but I'm working on a plan for it. It will likely be reported as a new event type, looking something like:
use Term::TermKey;
my $tk = Term::TermKey->new;
syswrite STDOUT, "\e[6n";
while( $tk->waitkey( my $key ) ) {
if( $key->type_is_position ) {
printf "The cursor is at %d, %d\n", $key->line, $key->col;
}
}
Requires some extra support in the underlying C library first, including the ability to hook other CSI sequences. Once that's in though it ought to be much easier to support more in the future, such as the many other status reports that come through CSIs.
Edit 2012/04/26: I've now released libtermkey 0.15 and Term::TermKey 0.14, which has this API as described above.

Terminal Control Escape Sequence: how can I catch row and column with "\e[6n"?

How could I catch the row and the column in variables instead of printing out (here ^[[12;1R)?
#!/usr/bin/env perl
use warnings;
use 5.012;
print "\n" x 10;
say "\e[6n";
I've found this:
#!/usr/bin/env perl
use warnings;
use 5.012;
use Term::ReadKey;
ReadMode 4;
system( 'clear' ) == 0 or die $?;
print "\e[6n";
my ( $x, $y ) = getch();
say "Col: $x - Row: $y";
print "Hello world\n" x 6;
print "\e[6n";
( $x, $y ) = getch();
say "Col: $x - Row: $y";
print "String\n" x 5;
print "go to column 21 -> |";
print "\e[6n";
( $x, $y ) = getch();
say "Col: $x - Row: $y";
ReadMode 0;
sub getch {
my $c = ReadKey 0;
if ( $c eq "\e" ) {
my $c = ReadKey 0.10;
if ( $c eq '[' ) {
my $c = ReadKey 0;
if ( $c =~ /\A\d/ ) {
my $c1 = ReadKey 0;
if ( $c1 ne '~' ) {
my $y = 0 + $c;
while ( 1 ) {
last if $c1 eq ';';
$y = 10 * $y + $c1;
$c1 = ReadKey 0;
}
my $x = 0;
while ( 1 ) {
$c1 = ReadKey 0;
last if $c1 eq 'R';
$x = 10 * $x + $c1;
}
return $x, $y;
}
}
}
}
}
Sorry it took a while, I added this feature ages ago then forgot to update the question ;)
use Term::TermKey;
my $termkey = Term::TermKey->new( \*STDIN );
STDOUT->autoflush( 1 );
print "\e[6n";
while( $termkey->waitkey( my $key ) == RES_KEY ) {
if( $key->type_is_position ) {
printf "Position is %d, %d\n", $key->col, $key->line;
last;
}
}