Transposing the matrix in perl - perl

I am trying to perform a transpose on a data contained in a file. The data is as follows:
1 2 3 4 5
2 3 4 5 6
4 5 6 7 9
4 3 7 6 9
I am getting the result as follows which is incorrect. I am not getting the error in the code due to which the last column is not transposed properly. Any solution...
Code:
#!/usr/bin/perl
use strict;
use warnings;
my #dependent; # matrix of dependent variable
# Reading the data from text file to the matrix
open( DATA, "<example.txt" ) or die "Couldn't open file , $!"; #depenedent
# Storing data into the array in matrix form
while ( my $linedata = <DATA> ) {
push #dependent, [ split '\t', $linedata ];
}
my $m = #dependent;
#print "$m\n";
my $n = #{ $dependent[1] };
#print $n;
#print "Matrix of dependent variables Y \n";
for ( my $i = 0; $i < $m; $i++ ) {
for ( my $j = 0; $j < $n; $j++ ) {
#print $dependent[$i][$j]," ";
}
#print "\n";
}
my #transpose;
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
$transpose[$i][$j] = $dependent[$j][$i];
}
}
for ( my $i = 0; $i < $n; $i++ ) {
for ( my $j = 0; $j < $m; $j++ ) {
print $transpose[$i][$j], " ";
}
print "\n";
}

chomp your data when you read it, before you split it; your strange output is caused by the last element of each row of the input still having a newline attached.
Just as a side note, DATA isn't a very good name to pick for a filehandle; perl already defines a special builtin filehandle named DATA for reading data that's embedded in a script or a module, so using that name for yourself can lead to confusion :)

Related

how to output into a tree like structure via Tree::DAG_Node?

I'm trying to write a program that can read from a file then put into a heap sort then output in a tree along with the sorted list. But I got stuck, instead of having the put be the numbers used in the heap sort, it just post 1-10 on one line then 1-9 on the next line. I'm really confused and I barely understand this language as it is. I put a example of the out below alone with the output I am hoping to have it draw.
#!/usr/bin/perl
use 5.006;
use strict;
use warnings;
use Tree::DAG_Node;
process_data(read_file('testing.txt'));
process_data((3,1,4,1,5,9,2,6,5,3,6));
sub read_file{
my($filename)=#_;
my #data=();
my #words;
open(my $fh, "<", $filename)
or die "Could not open file: $!\n";
while(<$fh>){
chomp;
#words = split(' ');
foreach my $word(#words){
push #data, $word;
}
}
close $fh;
return #data;
}
sub heap_sort {
my ($a) = #_;
my $n = #$a;
for (my $i = ($n - 2) / 2; $i >= 0; $i--) {
down_heap($a, $n, $i);
}
for (my $i = 0; $i < $n; $i++) {
my $t = $a->[$n - $i - 1];
$a->[$n - $i - 1] = $a->[0];
$a->[0] = $t;
down_heap($a, $n - $i - 1, 0);
}
}
sub down_heap {
my ($a, $n, $i) = #_;
while (1) {
my $j = max($a, $n, $i, 2 * $i + 1, 2 * $i + 2);
last if $j == $i;
my $t = $a->[$i];
$a->[$i] = $a->[$j];
$a->[$j] = $t;
$i = $j;
}
sub max {
my ($a, $n, $i, $j, $k) = #_;
my $m = $i;
$m = $j if $j < $n && $a->[$j] > $a->[$m];
$m = $k if $k < $n && $a->[$k] > $a->[$m];
return $m;
}
}
sub draw_tree{
my(#data)=#_;
my $root = Tree::DAG_Node->new;
$root->name($_[0]);
$root->new_daughter->name($_) for (1..10);
my #names = #data;
my $count = 50;
for my $n ($root->daughters) {
for (split //, $names[$count++]) {
$n->new_daughter->name($_)
}
}
print map "$_\n", #{$root->draw_ascii_tree};
}
sub process_data{
my(#data)=#_;
my #a = #data;
print "#a\n";
print "\n";
heap_sort(\#a);
draw_tree(#a);
print "\n";
print "#a\n";
print "\n";
}
here is the output I get
CMD output of my code
Output I am expecting:
(using example data)
1 1 9 4 5
1
|
1 ----- 9
|
/\
4 5
1 1 4 5 9

modify this code to print elements that differ a x number from previous position

I wrote this code to find substrings every x elements:
print "enter file path\n";
$letters = <>;
chomp ($letters);
$sequence = "";
open (LETTERS, $letters) or die "error opening\n";
print "how many letters at a shot\n";
$number = <>;
chomp ($number);
$size = length $sequence;
chomp ($size);
for ($i = 0; $i < $size; $i++) {
$test = substr ($sequence, $i, $number);
print "> Test $i\n";
print "$test\n";
if ($i >= $size - $number) {
last;
}
}
so if I open a file with this string and choose x = 3:
abcdefg
I get this result:
> Test 0 abc
> Test 1 bcd
> Test 2 cde
> Test 3 def
> Test 4 efg
Each substring differs from one position from the previous substring, I'd like to be able to control this number and dislocate the substring by 2 for example. So the result would be:
> Test 0 abc
> Test 2 cde
> Test 4 efg
Any suggestions? Thanks
Add:
...
$step = <>;
...
And change:
for ($i = 0; $i < $size; $i++) {
to
for ($i = 0; $i < $size; $i+=$step) {
for ($i = 0; $i < $size; $i += 2) {

PERL Script for discerning between cavity and void space in PDB(Protein Database) file

The problem with the following code is only in one function of the code. The problem function is with a comment head and close. This is my first post to StackOverflow so bear with me. The following script has some modules and other functions that I know work by testing them with the problem function commented out but I just cannot seem to get that one function to work. When ran, the script runs until the enviroment kills the execution.
Basically what this program does is takes a PDB file, copies everything out of the PDB file and creates a new one and pastes all of the original input file content into the new file and appends the cavities(coordinates of center of the cavity and the specified probe radius) that the program is supposed to find.
The problem function within the code is supposed to distinguish between a void space within a bound box of the structure and a cavity. Cavities are considered to be a closed space somewhere within the structure. A void space is any space or coordinate within the bounding box of max and min coorindates where there isn't an atom.The cavity must be large enough to fit into a specified probe radius. There is also a specified resolution when searching through the 3D hashtable of coordinates.
Can anyone tell me why my code isn't working. Anything immediate. I have tested and tested and cannot seem to find the error.
Thank you.
#!/usr/bin/perl
# Example 11-6 Extract atomic coordinates from PDB file
use strict;
use warnings;
use BeginPerlBioinfo; # see Chapter 6 about this module
#open file for printing
open(FH,">results.pdb");
open(PDB,"oneAtom.pdb");
while(<PDB>) { print FH $_; }
close(PDB);
# Read in PDB file
my #file = get_file_data('oneAtom.pdb');
# Parse the record types of the PDB file
my %recordtypes = parsePDBrecordtypes(#file);
# Extract the atoms of all chains in the protein
my %atoms = parseATOM ( $recordtypes{'ATOM'} );
#define some variables and get the atom indices stored in atom_numbers array
my #atom_numbers = sort {$a <=> $b} keys %atoms;
my $resolution = 4.;
my $lo = 1000;
my $hi = -1000;
my $p_rad = 1;
my %pass;
#set the grid boundaries
foreach my $l ( #atom_numbers ) {
for my $i (0..2) {
if ( $atoms{$l}[$i] < $lo ) { $lo = $atoms{$l}[$i]; }
if ( $atoms{$l}[$i] > $hi ) { $hi = $atoms{$l}[$i]; }
}
}
$lo = $lo - 2* $resolution;
$hi = $hi + 2* $resolution;
#compute min distance to the pdb structure from each grid point
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
my $min_dist = 1000000;
foreach my $l ( #atom_numbers ) {
my $distance = sqrt((($atoms{$l}[0]-($i))*($atoms{$l}[0]-($i))) + (($atoms{$l}[1]-($j))*($atoms{$l}[1]-($j))) + (($atoms{$l}[2]-($k))*($atoms{$l}[2]-($k))));
$distance = $distance - ( $p_rad + $atoms{$l}[3] );
if ( $distance < $min_dist ) {
$min_dist = $distance;
}
}
$pass{$i}{$j}{$k} = $min_dist;
if ( $pass{$i}{$j}{$k} > 0 ) {
$pass{$i}{$j}{$k} = 1;
} else { $pass{$i}{$j}{$k} = 0;
}
}
}
}
#define a starting point on the outside of the grid and place first on list of points
#my #point = ();
my $num_cavities = 0;
#define some offsets used to compute neighbors
my %offset = (
1 => [-1*$resolution,0,0],
2 => [1*$resolution,0,0],
3 => [0,-1*$resolution,0],
4 => [0,1*$resolution,0],
5 => [0,0,-1*$resolution],
6 => [0,0,1*$resolution],
);
##########################################################
#function below with problem
##########################################################
my #point = ();
push #point,[$hi,$hi,$hi];
=pod
#do the following while there are points on the list
while ( #point ) {
foreach my $vector ( keys %offset ) { #for each offset vector
my #neighbor = (($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])); #compute neighbor point
if ( exists $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} ) { #see if neighbor is in the grid
if ( $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} == 1 ) { #if it is see if its further than the probe radius
push #point,[($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])]; #if it is push it onto the list of points
}
}
}
$pass{$point[0][0]}{$point[0][1]}{$point[0][2]} = 0; #eliminate the point just tested from the pass array
shift #point; #move to the next point in the list
}
=cut
##############################################################
# end of problem function
##############################################################
my $grid_ind = $atom_numbers[$#atom_numbers];
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
if ( $pass{$i}{$j}{$k} == 1 ) {
$grid_ind = $grid_ind + 1;
my $n = sprintf("%5d",$grid_ind);
my $x = sprintf("%7.3f",$i);
my $y = sprintf("%7.3f",$j);
my $z = sprintf("%7.3f",$k);
my $w = sprintf("%6.3f",1);
my $p = sprintf("%6.3f",$p_rad);
print FH "ATOM $n MC CAV $n $x $y $z $w $p \n";
}
}
}
}
close(FH);
exit;
#do the following while there are points on the list
for ( my $i = $lo ; $i <= $hi ; $i = $i + $resolution ) {
for ( my $j = $lo ; $j <= $hi ; $j = $j + $resolution ) {
for ( my $k = $lo ; $k <= $hi ; $k = $k + $resolution ) {
if ( $pass{$i}{$j}{$k} == 1 ) {
push #point,[$i,$j,$k];
$num_cavities++;
while ( #point ) {
foreach my $vector ( keys %offset ) { #for each offset vector
my #neighbor = (($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])); #compute neighbor point
if ( exists $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} ) { #see if neighbor is in the grid
if ( $pass{$neighbor[0]}{$neighbor[1]}{$neighbor[2]} == 1 ) { #if it is see if its further than the probe radius
push #point,[($point[0][0]+$offset{$vector}[0]),($point[0][1]+$offset{$vector}[1]),($point[0][2]+$offset{$vector}[2])]; #if it is push it onto the list of points
}
}
}
$pass{$point[0][0]}{$point[0][1]}{$point[0][2]} = 0; #eliminate the point just tested from the pass array
shift #point; #move to the next point in the list
}
}
}
}
}
#print the results
print "\nthe structure has " . $num_cavities . " cavities.\n\n";
#print the point that are left over (these correspond to the cavities)
#for ( my $i = -10 ; $i <= 10 ; $i = $i + $resolution ) {
# for ( my $j = -10 ; $j <= 10 ; $j = $j + $resolution ) {
# for ( my $k = -10 ; $k <= 10 ; $k = $k + $resolution ) {
# print $i . "\t" . $j . "\t" . $k . "\t" . $pass{$i}{$j}{$k} . "\n";
# }
# }
#}
###################################################
# function
###################################################
sub parseATOM {
my($atomrecord) = #_;
use strict;
use warnings;
my %results = ( );
# Turn the scalar into an array of ATOM lines
my(#atomrecord) = split(/\n/, $atomrecord);
foreach my $record (#atomrecord) {
my $number = substr($record, 6, 5); # columns 7-11
my $x = substr($record, 30, 8); # columns 31-38
my $y = substr($record, 38, 8); # columns 39-46
my $z = substr($record, 46, 8); # columns 47-54
my $r = substr($record, 60, 6); # columns 47-54
#my $element = substr($record, 76, 2); # columns 77-78
# $number and $element may have leading spaces: strip them
$number =~ s/\s*//g;
#$element =~ s/\s*//g;
$x =~ s/\s*//g;
$y =~ s/\s*//g;
$z =~ s/\s*//g;
$r =~ s/\s*//g;
# Store information in hash
#$results{$number} = [$x,$y,$z,$element];
$results{$number} = [$x,$y,$z,$r];
}
# Return the hash
return %results;
}
Here's one thing that is almost certainly slowing things down:
$x =~ s/\s*//g;
$y =~ s/\s*//g;
$z =~ s/\s*//g;
$r =~ s/\s*//g;
It is possible for \s* to match an empty string, so you are replacing empty strings with empty strings, for each empty string in the target string.
Change to:
$x =~ s/\s+//g;
$y =~ s/\s+//g;
$z =~ s/\s+//g;
$r =~ s/\s+//g;
You have the following definitions:
my $lo = 1000;
my $hi = -1000;
So when you get to your first for loop, you will set $i to 1000, and then fail the check to see if it is less than -1000.

perl: bignum not working with STDIN

#!/usr/bin/perl
use bignum;
$line = <STDIN>;
( $arr[0], $arr[1], $n ) = split( / /, $line );
$i = 2;
sub func {
while ( $i < $n ) {
$t = $arr[ $i - 1 ];
$arr[$i] = $arr[ $i - 1 ] * $arr[ $i - 1 ] + $arr[ $i - 2 ];
$i = $i + 1;
}
return $arr[ $i - 1 ];
}
print func;
when i am setting manual value for $arr[1] then bignum is working fine but when the value is taken from STDIN then it is being printed as integer no BIGINt. can anyone explain why this is happening.
When you use strings in a numeric context, perl converts them using your C library's atof(). This is not changed by bignum. If you'd like your strings converted in a different manner, I'd recommend Math::BigFloat->new or Math::BigInt->new.

How do I speed up pattern recognition in perl

This is the program as it stands right now, it takes in a .fasta file (a file containing genetic code), creates a hash table with the data and prints it, however, it is quite slow. It splits a string an compares it against all other letters in the file.
use strict;
use warnings;
use Data::Dumper;
my $total = $#ARGV + 1;
my $row;
my $compare;
my %hash;
my $unique = 0;
open( my $f1, '<:encoding(UTF-8)', $ARGV[0] ) or die "Could not open file '$ARGV[0]' $!\n";
my $discard = <$f1>;
while ( $row = <$f1> ) {
chomp $row;
$compare .= $row;
}
my $size = length($compare);
close $f1;
for ( my $i = 0; $i < $size - 6; $i++ ) {
my $vs = ( substr( $compare, $i, 5 ) );
for ( my $j = 0; $j < $size - 6; $j++ ) {
foreach my $value ( substr( $compare, $j, 5 ) ) {
if ( $value eq $vs ) {
if ( exists $hash{$value} ) {
$hash{$value} += 1;
} else {
$hash{$value} = 1;
}
}
}
}
}
foreach my $val ( values %hash ) {
if ( $val == 1 ) {
$unique++;
}
}
my $OUTFILE;
open $OUTFILE, ">output.txt" or die "Error opening output.txt: $!\n";
print {$OUTFILE} "Number of unique keys: " . $unique . "\n";
print {$OUTFILE} Dumper( \%hash );
close $OUTFILE;
Thanks in advance for any help!
It is not clear from the description what is wanted from this script, but if you're looking for matching sets of 5 characters, you don't actually need to do any string matching: you can just run through the whole sequence and keep a tally of how many times each 5-letter sequence occurs.
use strict;
use warnings;
use Data::Dumper;
my $str; # store the sequence here
my %hash;
# slurp in the whole file
open(IN, '<:encoding(UTF-8)', $ARGV[0]) or die "Could not open file '$ARGV[0]' $!\n";
while (<IN>) {
chomp;
$str .= $_;
}
close(IN);
# not sure if you were deliberately omitting the last two letters of sequence
# this looks at all the sequence
my $l_size = length($str) - 4;
for (my $i = 0; $i < $l_size; $i++) {
$hash{ substr($str, $i, 5) }++;
}
# grep in a scalar context will count the values.
my $unique = grep { $_ == 1 } values %hash;
open OUT, ">output.txt" or die "Error opening output.txt: $!\n";
print OUT "Number of unique keys: ". $unique."\n";
print OUT Dumper(\%hash);
close OUT;
It might help to remove searching for information that you already have.
I don't see that $j depends upon $i so you're actually matching values to themselves.
So you're getting bad counts as well. It works for 1, because 1 is the square of 1.
But if for each five-character string you're counting strings that match, you're going
to get the square of the actual number.
You would actually get better results if you did it this way:
# compute it once.
my $lim = length( $compare ) - 6;
for ( my $i = 0; $i < $lim; $i++ ){
my $vs = substr( $compare, $i, 5 );
# count each unique identity *once*
# if it's in the table, we've already counted it.
next if $hash{ $vs };
$hash{ $vs }++; # we've found it, record it.
for ( my $j = $i + 1; $j < $lim; $j++ ) {
my $value = substr( $compare, $j, 5 );
$hash{ $value }++ if $value eq $vs;
}
}
However, it could be an improvement on this to do an index for your second loop
and let the c-level of perl do your matching for you.
my $pos = $i;
while ( $pos > -1 ) {
$pos = index( $compare, $vs, ++$pos );
$hash{ $vs }++ if $pos > -1;
}
Also, if you used index, and wanted to omit the last two characters--as you do, it might make sense to remove those from the characters you have to search:
substr( $compare, -2 ) = ''
But you could do all of this in one pass, as you loop through file. I believe the code
below is almost an equivalent.
my $last_4 = '';
my $last_row = '';
my $discard = <$f1>;
# each row in the file after the first...
while ( $row = <$f1> ) {
chomp $row;
$last_row = $row;
$row = $last_4 . $row;
my $lim = length( $row ) - 5;
for ( my $i = 0; $i < $lim; $i++ ) {
$hash{ substr( $row, $i, 5 ) }++;
}
# four is the maximum we can copy over to the new row and not
# double count a strand of characters at the end.
$last_4 = substr( $row, -4 );
}
# I'm not sure what you're getting by omitting the last two characters of
# the last row, but this would replicate it
foreach my $bad_key ( map { substr( $last_row, $_ ) } ( -5, -6 )) {
--$hash{ $bad_key };
delete $hash{ $bad_key } if $hash{ $bad_key } < 1;
}
# grep in a scalar context will count the values.
$unique = grep { $_ == 1 } values %hash;
You may be interested in this more concise version of your code that uses a global regex match to find all the subsequences of five characters. It also reads the entire input file in one go, and removes the newlines afterwards.
The path to the input file is expected as a parameter on the command line, and the output is sent to STDIN, and can be redirected to a file on the command line, like this
perl subseq5.pl input.txt > output.txt
I've also used Data::Dump instead of Data::Dumper because I believe it to be vastly superior. However it is not a core module, and so you will probably need to install it.
use strict;
use warnings;
use open qw/ :std :encoding(utf-8) /;
use Data::Dump;
my $str = do { local $/; <>; };
$str =~ tr|$/||d;
my %dups;
++$dups{$1} while $str =~ /(?=(.{5}))/g;
my $unique = grep $_ == 1, values %dups;
print "Number of unique keys: $unique\n";
dd \%dups;