How I can take the coordinates of a block of numbers?

How I can take the coordinates of a block of numbers? - perl

I have a problem tha bothers me a lot...
I have a file with two columns (thanks to your help in a previous question) like:
14430001 0.040
14430002 0.000
14430003 0.990
14430004 1.000
14430005 0.050
14430006 0.490
....................
the first column is coordinates the second probabilities.
I am trying to find the blocks with probability >=0.990 and to be more than 100 in size.
As output I want to be like this:
14430001 14430250
14431100 14431328
18750003 18750345
.......................
where the first column has the coordinate of the start of each block and the second the end of it.
I wrote this script:
use strict;
#use warnings;
use POSIX;
my $scores_file = $ARGV[0];
#finds the highly conserved subsequences
open my $scores_info, $scores_file or die "Could not open $scores_file: $!";
#open(my $fh, '>', $coords_file) or die;
my $count = 0;
my $cons = "";
my $newcons = "";
while( my $sline = <$scores_info>) {
my #data = split('\t', $sline);
my $coord = $data[0];
my $prob = $data[1];
if ($data[1] >= 0.990) {
#$cons = "$cons + '\n' + $sline + '\n'";
$cons = join("\n", $cons, $sline);
# print $cons;
$count++;
if($count >= 100) {
$newcons = join("\n", $newcons, $cons);
my #array = split /'\n'/, $newcons;
print #array;
}
}
else {
$cons = "";
$count = 0;
}
}
It gives me the lines with probability >=0.990 (the first if works) but the coordinates are wrong. When Im trying to print it in a file it stacks, so I have only one sample to check it.
Im terrible sorry if my explanations aren't helpful, but I am new in programming.
Please, I need your help...
Thank you very much in advance!!!

You seem to be using too much variables. Also, after splitting the array and assigning its parts to variables, use the new variables rather than the original array.
sub output {
my ($from, $to) = #_;
print "$from\t$to\n";
}
my $threshold = 0.980; # Or is it 0.990?
my $count = 0;
my ($start, $last);
while (my $sline = <$scores_info>) {
my ($coord, $prob) = split /\t/, $sline;
if ($prob >= $threshold) {
$count++;
defined $start or $start = $coord;
$last = $coord;
} else {
output($start, $last) if $count > 100;
undef $start;
$count = 0;
}
}
output($start, $last) if $count > 100;
(untested)

Related

How can I calculate the geometric center of a protein in Perl?

I have a PDB file which contains information about a specific protein. One of the information it holds is the positions of the different atoms (xyz coordinates).
The file is the following https://files.rcsb.org/view/6U9D.pdb . With this file I want to calculate the geometric center of the atoms. In theory I know what I need to do, but the script I wrote does not seem to work.
The first part of the program, before the for loop, is another part of the assignment which requires me to read the sequence and convert it from the 3 letter nomenclature to the 1 letter one. The part that interests me is the for loop until the end. I tried to pattern match in order to isolate the XYZ coordinates. Then I used a counter that I had set up in the beginning which is the $k variable. When I check the output on cygwin the only output I get is the sequence 0 0 0 instead of the sum of each dimension divided by $k. Any clue what has gone wrong?
$k=0;
open (IN, '6U9D.pdb.txt');
%amino_acid_conversion = (
ALA=>'A',TYR=>'Y',MET=>'M',LEU=>'L',CYS=>'C',
GLY=>'G',ARG=>'R',ASN=>'N',ASP=>'D',GLN=>'Q',
GLU=>'E',HIS=>'H',TRP=>'W',LYS=>'K',PHE=>'F',
PRO=>'P',SER=>'S',THR=>'T',ILE=>'I',VAL=>'V'
);
while (<IN>) {
if ($_=~m/HEADER\s+(.*)/){
print ">$1\n"; }
if ($_=~m/^SEQRES\s+\d+\s+\w+\s+\d+\s+(.*)/){
$seq.=$1;
$seq=~s/ //g;
}
}
for ($i=0;$i<=length $seq; $i+=3) {
print "$amino_acid_conversion{substr($seq,$i,3)}";
if ($_=~m/^ATOM\s+\d+\s+\w+\s+\w+\s+\w+\s+\d+\s+(\S+)\s+(\S+)\s+(\S+)/) {
$x+=$1; $y+=$2; $z+=$3; $k++;
}
}
print "\n";
#print $k;
$xgk=($x/$k); $ygk=($y/$k); $zgk=($z/$k);
print "$xgk $ygk $zgk \n";

I do not know bioinformatics but it seems like you should do something like this:
use feature qw(say);
use strict;
use warnings;
my $fn = '6U9D.pdb';
open ( my $IN, '<', $fn ) or die "Could not open file '$fn': $!";
my $seq = '';
my $x = 0;
my $y = 0;
my $z = 0;
my $k = 0;
while (<$IN>) {
if ($_ =~ m/HEADER\s+(.*)/) {
say ">$1";
}
if ($_=~m/^SEQRES\s+\d+\s+\w+\s+\d+\s+(.*)/){
$seq .= $1;
}
if ($_ =~ m/^ATOM\s+\d+\s+\w+\s+\w+\s+\w+\s+\d+\s+(\S+)\s+(\S+)\s+(\S+)/) {
$x+=$1; $y+=$2; $z+=$3; $k++;
}
}
close $IN;
$seq =~ s/ //g;
my %amino_acid_conversion = (
ALA=>'A',TYR=>'Y',MET=>'M',LEU=>'L',CYS=>'C',
GLY=>'G',ARG=>'R',ASN=>'N',ASP=>'D',GLN=>'Q',
GLU=>'E',HIS=>'H',TRP=>'W',LYS=>'K',PHE=>'F',
PRO=>'P',SER=>'S',THR=>'T',ILE=>'I',VAL=>'V'
);
my %unknown_keys;
my $conversion = '';
say "Sequence length: ", length $seq;
for (my $i=0; $i < length $seq; $i += 3) {
my $key = substr $seq, $i, 3;
if (exists $amino_acid_conversion{$key}) {
$conversion.= $amino_acid_conversion{$key};
}
else {
$unknown_keys{$key}++;
}
}
say "Conversion: $conversion";
say "Unknown keys: ", join ",", keys %unknown_keys;
say "Number of atoms: ", $k;
my $xgk=($x/$k);
my $ygk=($y/$k);
my $zgk=($z/$k);
say "Geometric center: $xgk $ygk $zgk";
This gives me the following output:
[...]
Number of atoms: 76015
Geometric center: 290.744642162734 69.196842162731 136.395842938893

error of "Use of uninitialized value"

previously I asked about a problem regarding my homework to calculate the alignment score of two protein sequences:
how to pass a variable to different if statement
However, I missed a significant amount of information in the question so I have to rewrite the code.
Now my code is done, but I have a warning as Use of uninitialized value $sequence2_new in split at alignment_sequence.pl line 52. Following this warning, there are numerous warning as Use of uninitialized value in string eq at alignment_sequence.pl line 56.I understand it because I have an uninitialized value, but I can't understand how it is uninitialized. My code is following:
#!/usr/in/perl
use warnings;
use strict;
use feature qw(say);
my $infile1 = "cystic_fibrosis.fasta";
my $inFH1;
unless (open($inFH1, "<", $infile1)){
die join (' ', "Can't open", $infile1, $!)
}
my #seq = <$inFH1>;
close $inFH1;
shift #seq;
my $sequence1 = join("", #seq);
$sequence1 =~ s/\n//g;
#parse the query sequence
my $infile2 = "sequence_collection.fasta";
my $inFH2;
unless (open($inFH2, "<", $infile2)){
die join (' ', "Can't open", $infile2, $!)
}
my $beginning = 1;
my #sequence2;
my $sequence2 = "";
while (my $line = <$inFH2>){
chomp $line;
my $chr = substr($line, 0, 1);
if ($chr ne ">"){
$sequence2 = $sequence2.$line;
}else {
if (! $beginning){
push #sequence2, $sequence2;
$sequence2 = "";
}elsif ($beginning){
$beginning = 0;
}
}
}
close $inFH2;
#parse multiple sequence
my $element = scalar(#sequence2);
for ($a = 0; $a < $element; $a++ ){
my $sequence2_new;
if (length $sequence1 < length $sequence2[$a]){
$sequence2_new = substr ($sequence2[$a], 0, length $sequence1);
}
my #sequence1_new = split('', $sequence1);
my #sequence2_new = split('', $sequence2_new);
my $element_new = scalar(#sequence1_new);
my $num = 0;
for ($b = 0; $b < $element_new; $b++){
if ($sequence1_new[$b] eq $sequence2_new[$b]){
$num++;
}
}
my $score = $num / length $sequence1;
say $sequence1;
say $sequence2[$a];
say "\n";
print "The alignment score is: ";
printf("%.2f", $score);
say "\n\n";
}
The file cystic_fibrosis.fasta contains:
>gi|90421313|ref|NP_000483.3| cystic fibrosis transmembrane conductance regulator [Homo sapiens]
MQRSPLEKASVVSKLFFSWTRPILRKGYRQRLELSDIYQIPSVDSADNLSEKLEREWDRELASKKNPKLI
NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFIVRTLLLHP
AIFGLHHIGMQMRIAMFSLIYKKTLKLSSRVLDKISIGQLVSLLSNNLNKFDEGLALAHFVWIAPLQVAL
LMGLIWELLQASAFCGLGFLIVLALFQAGLGRMMMKYRDQRAGKISERLVITSEMIENIQSVKAYCWEEA
MEKMIENLRQTELKLTRKAAYVRYFNSSAFFFSGFFVVFLSVLPYALIKGIILRKIFTTISFCIVLRMAV
TRQFPWAVQTWYDSLGAINKIQDFLQKQEYKTLEYNLTTTEVVMENVTAFWEEGFGELFEKAKQNNNNRK
TSNGDDSLFFSNFSLLGTPVLKDINFKIERGQLLAVAGSTGAGKTSLLMVIMGELEPSEGKIKHSGRISF
CSQFSWIMPGTIKENIIFGVSYDEYRYRSVIKACQLEEDISKFAEKDNIVLGEGGITLSGGQRARISLAR
AVYKDADLYLLDSPFGYLDVLTEKEIFESCVCKLMANKTRILVTSKMEHLKKADKILILHEGSSYFYGTF
SELQNLQPDFSSKLMGCDSFDQFSAERRNSILTETLHRFSLEGDAPVSWTETKKQSFKQTGEFGEKRKNS
ILNPINSIRKFSIVQKTPLQMNGIEEDSDEPLERRLSLVPDSEQGEAILPRISVISTGPTLQARRRQSVL
NLMTHSVNQGQNIHRKTTASTRKVSLAPQANLTELDIYSRRLSQETGLEISEEINEEDLKECFFDDMESI
PAVTTWNTYLRYITVHKSLIFVLIWCLVIFLAEVAASLVVLWLLGNTPLQDKGNSTHSRNNSYAVIITST
SSYYVFYIYVGVADTLLAMGFFRGLPLVHTLITVSKILHHKMLHSVLQAPMSTLNTLKAGGILNRFSKDI
AILDDLLPLTIFDFIQLLLIVIGAIAVVAVLQPYIFVATVPVIVAFIMLRAYFLQTSQQLKQLESEGRSP
IFTHLVTSLKGLWTLRAFGRQPYFETLFHKALNLHTANWFLYLSTLRWFQMRIEMIFVIFFIAVTFISIL
TTGEGEGRVGIILTLAMNIMSTLQWAVNSSIDVDSLMRSVSRVFKFIDMPTEGKPTKSTKPYKNGQLSKV
MIIENSHVKKDDIWPSGGQMTVKDLTAKYTEGGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRL
LNTEGEIQIDGVSWDSITLQQWRKAFGVIPQKVFIFSGTFRKNLDPYEQWSDQEIWKVADEVGLRSVIEQ
FPGKLDFVLVDGGCVLSHGHKQLMCLARSVLSKAKILLLDEPSAHLDPVTYQIIRRTLKQAFADCTVILC
EHRIEAMLECQQFLVIEENKVRQYDSIQKLLNERSLFRQAISPSDRVKLFPHRNSSKCKSKPQIAALKEE
TEEEVQDTRL
The file sequence_collection.fasta contains 100 similar blocks:
>gi|1100985|gb|AAC48608.1| CFTR chloride channel [Oryctolagus cuniculus]
MQRSPLEKAGVLSKLFFSWTRPILRKGYRQRLELSDIYQIPSADSADNLSEKLEREWDRELASKKNPKLI
NALRRCFFWRFMFYGIFLYLGEVTKAVQPLLLGRIIASYDPDNKEERSIAIYLGIGLCLLFVVRTLLLHP
AIFGLHHIGMQMRIAMFSLIYKKGLALAHFVWISPLQVTLLMGLLWELLQASAFCGLAFLIVLALVQAGL
GRMMMKYRDQRAGKINERLVITSEMIENIQSVKAYCWEEAMEKMIENLRQTELKLTRKAAYVRYFNSSAF
FFSGFFVVFLSVLPYALTKGIILRKIFTTISFCIVLRMAVTRQFPWAVQTWYDSLGAINKIQDFLQKQEY
KTLEYNLTTTEVVMDNVTAFWEEGFGELFEKAKQNNSDRKISNGDNNLFFSNFSLLGAPVLEDISFKIER
GQLLAVAGSTGAGKTSLLMMITGELEPSEGKIKHSGRISFCSQFSWIMPGTIKENIIFGVSYDEYRYRSV
IKACQLEEDISKFTEKDNTVLGEGGITLSGGQRARISLARAVYKDADLYLLDSPFGYLDVLTEKEIFESC
VCKLMANKTRIMVTSKMEHLKKADKILILHEGSSYFYGTFSELQSLRPDFSSKLMGYDSFDQFSAERRNS
ILTETLRRFSLEGDASVSWNDTRKQSFKQNGELGEKRKNSILNPVNSMRKFSIVLKTPLQMNGIEEDSDA
TIERRLSLVPDSEQGEAILPRSNMINTGPMLQGCRRQSVLNLMTHSVSQGPSIYRRTTTSTRKMSLAPQT
NLTEMDIYSRRLSQESGLEISEEINEEDLKECFIDDVDSIPTVTTWNTYLRYITVHRSLIFVLIWCIVIF
LAEVAASLVVLWLFGNTAPQDKENSTKSGNSSYAVIITNTSSYYFFYIYVGVADTLLALGLFRGLPLVHT
LITVSKILHHKMLHSVLQAPMSTLNTLKAGGILNRFSKDIAILDDLLPLTIFDFIQLLLIVVGAIAVVSV
LQPYIFLATVPVIAAFILLRAYFLHTSQQLKQLESEGRSPIFTHLVTSLKGLWTLRAFGRQPYFETLFHK
ALNLHTANWFLYLSTLRWFQMRIEMIFVLFFIAVAFISILTTGEGEGRVGIILTLAMNIMSTLQWAVNSS
IDVDSLMQSVSRVFMFIDMPTEAKSTKSIKPSSNCQLSKVMIIENQHVKKDDVWPSGGQMTVKGLTAKYI
DSGNAILENISFSISPGQRVGLLGRTGSGKSTLLSAFLRLLSTEGEIQIDGVSWDSITLQQWRKAFGVIP
QKVFIFSGTFRKNLDPYEQWSDQEIWKVADEVGLRSVIEQFPGKLDFVLVDGGYVLSHGHKQLMCLARSV
LSKAKILLLDEPSAHLDPITYQIIRRTLKQAFADCTVILCEHRIEAMLECQRFLVIEENTVRQYESIQKL
LSEKSLFRQAISSSDRAKLFPHRNSSKHKSRPQITALKEEAEEEVQGTRL
Sorry I know it's very redundant. I will greatly appreciate any suggestions.

In this code, you do not assign a value to $sequence2_new when the if condition is false. That means it remains undef.
my $sequence2_new;
if (length $sequence1 < length $sequence2[$a]){
$sequence2_new = substr ($sequence2[$a], 0, length $sequence1);
}
Try assigning a value to it:
my $sequence2_new = '';
if (length $sequence1 < length $sequence2[$a]){
$sequence2_new = substr ($sequence2[$a], 0, length $sequence1);
}

Range overlapping number and percentage calculation

I want to calculate overlap number(#) and percentage (%) from a series of ranges values distributed in four different files initiated with a specific identifier(id) (like NP_111111.4) . The initial list of ids are taken from file1.txt (starting file) and if the id matches with ids of other file, overlaps are calculated. Suppose my files are like this:
file1.txt
NP_111111.4: 1-9 12-20 30-41
YP_222222.2: 3-30 40-80
file2.txt
NP_111111.4: 1-6, 13-22, 31-35, 36-52
NP_414690.4: 360-367, 749-755
YP_222222.2: 19-24, 22-40
file3.txt
NP_418214.2: 1-133, 135-187, 195-272
YP_222222.2: 1-10
file4.txt
NP_418119.2
YP_222222.2 GO:0016878, GO:0051108
NP_111111.4 GO:0005887
From these input file, I want to create a .csv or excel output with separate columns with header as:
id overlap_file1_file2(#) overlap_file1_file2(%) overlap_file1_file3(#) overlap_file1_file3(%) overlap_file1_file2_file3(#) overlap_file1_file2_file3(%) Go_Terms(File4)
I am learning perl and found a perl module "strictures" for this type of range comparison. I am calculating overlapping number and percentage from two ranges as:
#!/usr/bin/perl
use strictures;
use Number::Range;
my $seq1 = Number::Range->new(8..356); #Start and stop for file1.txt
my $seq2 = Number::Range->new(156..267); #Start and stop for file2.txt
my $overlap = 0;
my $sseq1 = $seq1->size;
my $percent = (($seq2->size * 100) / $seq1->size);
foreach my $int ($seq2->range) {
if ( $seq1->inrange($int) ) {
$overlap++;
}
else {
next;
}
}
print "Total size= $sseq1 Number overlapped= $overlap Percentage overlap= $percent \n";
But I could not find a way to match ids of (file1.txt) with other files to extract specific information and to print them in a output csv file.
Please help. Thanks for your consideration.

This is a fragile solution in that it can only check 3 files for overlaps. If more files are involved, the code would need to be restructured. It uses Set::IntSpan to calculate the overlaps (and percent of overlaps.
#!/usr/bin/perl
use strict;
use warnings;
use Set::IntSpan;
use autodie;
my $file1 = 'file1';
my #files = qw/file2 file3/;
my %data;
my %ids;
open my $fh1, '<', $file1;
while (<$fh1>) {
chomp;
my ($id, $list) = split /:\s/;
$ids{$id}++;
$data{$file1}{$id} = Set::IntSpan->new(split ' ', $list);
}
close $fh1;
for my $file (#files) {
open my $fh, '<', $file;
while (<$fh>) {
chomp;
my ($id, $list) = split /:\s/;
next unless exists $ids{$id};
$data{$file}{$id} = Set::IntSpan->new(split /,\s/, $list);
}
close $fh;
}
my %go_terms;
open my $go, '<', 'file4';
while (<$go>) {
chomp;
my ($id, $terms) = split ' ', $_, 2;
$go_terms{$id} = $terms =~ tr/,//dr;
}
close $go;
my %output;
for my $file (#files) {
for my $id (keys %ids) {
my $count = ($data{$file1}{$id} * $data{$file}{$id})->size;
my $percent = sprintf "%.0f", 100 * $count / $data{$file1}{$id}->size;
$output{$id}{$file} = [$count, $percent];
}
}
for my $id (keys %ids) {
my $count = ($data{$file1}{$id} * $data{$files[0]}{$id} * $data{$files[1]}{$id})->size;
my $percent = sprintf "%.0f", 100 * $count / $data{$file1}{$id}->size;
$output{$id}{all_files} = [$count, $percent];
}
# output saved as f2.csv
print join(",", qw/ID f1f2_overlap f1f2_%overlap
f1f3_overlap f1f3_%overlap
f1f2f3_overlap f1f2f3_%overlap Go_terms/), "\n";
for my $id (keys %output) {
print "$id,";
for my $file (#files, 'all_files') {
my $aref = $output{$id}{$file};
print join(",", #$aref), ",";
}
print +($go_terms{$id} // ''), "\n";
}
The Excel sheet looks like this.

exists statement not working perl

This is a homework assignment. I am not looking for the "code to make it work" more looking for a point in the right direction on where my logic is wrong.
use strict;
use warnings;
#rot13 sub for passwords
sub rot13{
my $result;
chomp(my $input = <STDIN>);``
# all has to be lower case
my $lower = lc $input;
my $leng = length $lower;
for(my $i = 0; $i < $leng; $i++){
my $temp = substr ($lower,$i,1);
my $con = ord $temp;
if($con >= '55'){
if($con >= '110'){
$con -= 13;
}
else{
$con += 13;
}
}
$result = $result . chr $con;
}
return $result;
};
#opening a file specified by the user for input and reading it
#into an array then closing file.
open FILE, $ARGV[0] or die "cannot open input.txt";
my #input = <FILE>;
close FILE;
my (#username,#password,#name,#uid,#shell,#ssn,#dir,#group,#gid);
my $ui = 100;
foreach(#input){
my ($nam, $ss, $gro) = split ('/', $_);
chomp ($gro);
$nam= lc $nam;
I created a hash so I can use the exists function then using the function and if it does exist go to the next round of the loop. I feel like I am missing something with this.
my %nacheck;
if( exists ($nacheck { '$nam' } )){
next;
}
$nacheck{ "$nam" } = 1;
while (my ($key, $value) = each %nacheck){
print "$key => $value\n";
}
All this works for now but any tips on how to do it better would be appreaciated
my($unf, $unm, $unl) = split (/ /, $nam);
$unf = (substr $unf,0,1);
$unm = (substr $unm,0,1);
$unl = (substr $unl,0,1);
my $un = $unf . $unm . $unl;
if(($gro) eq "faculty"){
push #username, $un;
push #gid, "1010";
push #dir, "/home/faculty/$un";
push #shell, "/bin/tcsh";
}
else{
my $lssn = substr ($ss,7,4);
push #username, $un . $lssn;
push #gid, "505";
push #dir, "/home/student/$un";
push #shell, "/bin/bash";
}
#pushing results onto global arrays to print out later
push #ssn, $ss;
my $pass = rot13;
push #password, $pass;
push #name, $nam;
push #uid, $ui += 1;
}
#printing results
for(my $i = 0; $i < #username; $i++){
print
"$username[$i]:$password[$i]:$uid[$i]:$gid[$i]:$name[$i]:$dir[$i]:$shell[$i]\n";
}

The value of the expression '$nam' is those four characters themselves. The value of the expression "$nam" is whatever the value of the variable $nam is, expressed as a string.
Double quotes allow string interpolation. Single quotes do not; you get exactly what you type.

As you've written it:
my %nacheck;
if( exists ($nacheck { '$nam' } )){
next;
}
$nacheck{ "$nam" } = 1;
the %nacheck is newly created and must be empty. Therefore the exists test fails.
Or have you just shown the definition adjacent to the test for the purpose of the example?
If so, can you show us what your code actually looks like?
Edit: Also, as Charles Engelke noted, you've used single-quotes around a variable '$nam' which is wrong.

How do I average column values from a tab-separated data file, ignoring a header row and the left column?

My task is to compute averages from the following data file, titled Lab1_table.txt:
retrovirus genome gag pol env
HIV-1 9181 1503 3006 2571
FIV 9474 1353 2993 2571
KoRV 8431 1566 3384 1980
GaLV 8088 1563 3498 2058
PERV 8072 1560 3621 1532
I have to write a script that will open and read this file, read each line by splitting the contents into an array and computer the average of the numerical values (genome, gag, pol, env), and write to a new file the average from each of the aforementioned columns.
I've been trying my best to figure out how to not take into account the first row, or the first column, but every time I try to execute on the command line I keep coming up with 'explicit package name' errors.
Global symbol #average requires explicit package name at line 23.
Global symbol #average requires explicit package name at line 29.
Execution aborted due to compilation errors.
I understand that this involves # and $, but even knowing that I've not been able to change the errors.
This is my code, but I emphasise that I'm a beginner having started this just last week:
#!/usr/bin/perl -w
use strict;
my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";
my $count = 0;
my $average = ();
while (<INFILE>) {
chomp;
my #columns = split /\t/;
$count++;
if ( $count == 1 ) {
$average = #columns;
}
else {
for( my $i = 1; $i < scalar $average; $i++ ) {
$average[$i] += $columns[$i];
}
}
}
for( my $i = 1; $i < scalar $average; $i++ ) {
print $average[$i]/$count, "\n";
}
I'd appreciate any insight, and I would also great appreciate letting me know by list numbering what you're doing at each step - if appropriate. I'd like to learn and it would make more sense to me if I was able to read through what someone's process was.

Here are the points you need to change
Use another variable for the headers
my $count = 0;
my #header = ();
my #average = ();
then change the logic inside if statement
if ( $count == 1 ) {
#header = #columns;
}
Now don't use the #average for the limit, use $i < scalar #columns for else statement.
Initially #average is zero, you will never get inside the for loop ever.
else {
for( my $i = 1; $i < scalar #columns; $i++ ) {
$average[$i] += $columns[$i];
}
}
Finally add -1 to your counter. Remember you increment your counter when you parse your header
for( my $i = 1; $i < scalar #average; $i++ ) {
print $average[$i]/($count-1), "\n";
}
Here is the final code
You can take advantage of #header to display the result neatly
#!/usr/bin/perl -w
use strict;
my $infile = "Lab1_table.txt"; # This is the file path
open INFILE, $infile or die "Can't open $infile: $!";
my $count = 0;
my #header = ();
my #average = ();
while (<INFILE>) {
chomp;
my #columns = split /\t/;
$count++;
if ( $count == 1 ) {
#header = #columns;
}
else {
for( my $i = 1; $i < scalar #columns; $i++ ) {
$average[$i] += $columns[$i];
}
}
}
for( my $i = 1; $i < scalar #average; $i++ ) {
print $average[$i]/($count-1), "\n";
}
There are other ways to write this code but I thought it would be better to just correct your code so that you can easily understand what is wrong with your code. Hope it helps

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How I can take the coordinates of a block of numbers? - perl

Related

How can I calculate the geometric center of a protein in Perl?

error of "Use of uninitialized value"

Range overlapping number and percentage calculation

exists statement not working perl

How do I average column values from a tab-separated data file, ignoring a header row and the left column?

Categories

Resources