average of three columns by specific number in a row using perl - perl

i have an input file with following 5 columns and i want to average the column numbers 3, 4 ,5 individually 3, 4, 5 till its 2nd column value is 5 and similarly for 2nd column value 7 and 2.
PHE 5 2 4 6
PHE 5 4 6 4
PHE 5 4 2 8
TRP 7 5 5 9
TRP 7 5 7 1
TRP 7 5 7 3
TYR 2 4 4 4
TYR 2 4 4 0
TYR 2 4 5 3
and i want an output like this
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3

perl -lane'
$k = join "\t", splice(#F, 0, 2);
$h{$k}{c}++ or push(#r, $k);
$h{$k}{t}[$_] += $F[$_] for 0 .. $#F;
END {
$, ="\t";
for (#r) {
($t, $c) = #{$h{$_}}{"t", "c"};
print $_, map sprintf("%.1f", $_/$c)*1, #$t;
}
}
' file
output
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3

Nice solution mpapec.
I started the the following solution as an experiment to see if I could code something that would only require a single for loop, with no END block needed. It devolved into 6 for loops instead and a perfect example of how never to code unless your goal is obfuscation.
Yep, it uses an external module. Yes, it's the stupidist code I'll ever post (I hope). But at the very least it might get a chuckle out of someone. And yep, it works! :)
use Array::Transpose;
use List::Util qw(sum max);
use strict;
use warnings;
my $g;
my $l;
print "$_\n" for map {
join ' ', map {sprintf "%-$_->[0]s", $_->[1]} transpose [$l, $_]
} grep {
$l = [map {max #$_} transpose [[map {length $_} #$_], $l || ()]]
} [qw(Txt Num Ave Ave Ave)], map {
my #c = transpose $_;
[$c[0][0], $c[1][0], map {map {/\./ ? sprintf("%.1f", $_) : $_} sum(#$_) / #$_} #c[2..$#c]]
} map {
$g && $g->[0][0] eq $_->[0] ? (push #$g, $_) && () : ($g = [$_])
} map {[split]} (<DATA>);
__DATA__
PHE 5 2 4 6
PHE 5 4 6 4
PHE 5 4 2 8
TRP 7 5 5 9
TRP 7 5 7 1
TRP 7 5 7 3
TYR 2 4 4 4
TYR 2 4 4 0
TYR 2 4 5 3
Outputs
Txt Num Ave Ave Ave
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3

Script without using modules.
Try this....
#!/usr/bin/env perl
open(DATA, "<input.txt") or die "Couldn't open file file.txt, $!";
my %h=();
my %c=();
print "\n";
while(<DATA>){
my $temp=$_;
if($temp=~m/^([A-Z]{3})\s+([\d]+)\s+([\d]+)\s+([\d]+)\s+([\d]+)/is)
{
my $key=$1;
$h{$key}{1} +=$2;
$h{$key}{2} +=$3;
$h{$key}{3} +=$4;
$h{$key}{4} +=$5;
if($c{$key})
{
$c{$key}++;
}
else
{
$c{$key}=1;
}
}
}
foreach $key (sort(keys %h)) {
#print $key.'='.$h{$key}{1}/$c{$key}." ".$h{$key}{2}/$c{$key}." ".$h{$key}{3}/$c{$key}." ".$h{$key}{4}/$c{$key};
printf("%s %d %.1f %.1f %.1f", $key, $h{$key}{1}/$c{$key},$h{$key}{2}/$c{$key},$h{$key}{3}/$c{$key},$h{$key}{4}/$c{$key});
print "\n";
}
print "\n";
close(DATA);
______OUTPUT________
PHE 5 3.3 4.0 6.0
TRP 7 5.0 6.3 4.3
TYR 2 4.0 4.3 2.3

Related

extracted data file containing range function is not active

i have extracted the following data in a file
0..5
8..10
12..16
but these are not working as range function. i have stored these in an array.
#arr=('0..5,8..10,12..16');
after printing the array it gives
0..5
8..10
12..16
but i need output as
0 1 2 3 4 5
8 9 10
12 13 14 15 16
am not getting where is the problem. why the stored data (..) is not working as range. function
You're starting with string representations of ranges, not actual perl ranges.
To get a perl array, you must convert your data. You could use eval like others have recommended. However, that's like using a machete to perform a haircut.
Instead, I'd advise using more precision tools to extract the range boundaries from the string and then build your new data structure. Using split or a regex could easily pull the values. The following does so using the latter:
use strict;
use warnings;
while (<DATA>) {
chomp;
my ($start, $end) = /(\d+)/g;
my #array = ($start .. $end);
print "#array\n";
}
__DATA__
0..5
8..10
12..16
Outputs:
0 1 2 3 4 5
8 9 10
12 13 14 15 16
Addendum for multiple entries on a row
The following allows for multiple ranges to be on a single row. Note, I'm using split in this version for the sake of variety, although I could have easily used a regex as well:
use strict;
use warnings;
while (<DATA>) {
chomp;
my #array;
for my $range (split ' ') {
my ($start, $end) = split /\.{2}/, $range, 2;
push #array, ($start .. $end);
}
print "#array\n";
}
__DATA__
0..5
4..9 14..18
8..10
12..16
Outputs:
0 1 2 3 4 5
4 5 6 7 8 9 14 15 16 17 18
8 9 10
12 13 14 15 16
Data is data. Perl does not evaluate data as Perl (i.e. expand .. range operator) unless you explicitly tell it to with eval. The following debug session should clarify things for you.
$ perl -de0
Loading DB routines from perl5db.pl version 1.33
Editor support available.
Enter h or `h h' for help, or `man perldebug' for more help.
main::(-e:1): 0
DB<1> #arr = ('0..5,8..10,12..16')
DB<2> p #arr
0..5,8..10,12..16
DB<3> eval "#arr = ('0..5,8..10,12..16')"
DB<4> p #arr
0..5,8..10,12..16
DB<5> #arr = ('0..5','8..10','12..16')
DB<6> p #arr
0..58..1012..16
DB<7> #arr = eval "(0..5,8..10,12..16);"
DB<8> x #arr
0 0
1 1
2 2
3 3
4 4
5 5
6 8
7 9
8 10
9 12
10 13
11 14
12 15
13 16
DB<9>
If you want Perl to expand string ranges into Perl ranges, you must eval that data.
use strict;
use warnings;
use feature qw(say);
my #arr=('0..5','8..10','12..16');
foreach my $range (#arr) {
say join ' ', eval ($range);
}
__END__
0 1 2 3 4 5
8 9 10
12 13 14 15 16
Try this to store the values in the array:
#arr=((0..5),(8..10),(12..16));

Perl - printing output after a for loop

I have this code where I have problem getting the output I want:
print OUT1 "$first[0]\t$first[1]\t$first[2]\t";
for my $index1 (3..8)
{
my $ratio1 = sprintf( "%.4f%s", $numerator/$denominator,"\t");
#print OUT1 "$ratio1";
$variable1 = "$ratio1"; # problem with this line
}
print OUT1 "$variable1"; # print to textfile
print OUT1 "\n";
I am trying to print out the output after it run finish the for loop 6 times (3 to 8). The data should arrange something like this:
Desired output (e.g.):
A B C 4 4 4 4 4 4
C D F 2 6 5 8 3 1
G H I 6 1 2 4 7 0
Instead, it print out only the last column:
A B C 4
C D F 1
G H I 0
so I change to this line by adding the "." to join the 6 columns together
$variable1 .= "$ratio1"; # problem with this line
and I get weird output like this:
A B C 4 4 4 4 4 4
A B C 4 4 4 4 4 4 2 6
A B C 4 4 4 4 4 4 2 6 4
A B C 4 4 4 4 4 4 2 6 4 6
A B C 4 4 4 4 4 4 2 6 4 6 1 ...
Is there anything wrong with my code somewhere?
Yes, you need to declare and initialize your $variable1 before your for loop
print OUT1 "$first\t$second\t$third\t";
my $variable1 = '';
for my $index1 (3..8)
{
$variable1 .= sprintf( "%.4f%s", $numerator/$denominator,"\t");
}
print OUT1 "$variable1"; # print to textfile
Alternatively, you could avoid a temporary buffer, and just print within the for loop

Iterate a program erasing datas in the original set

I am trying to study an algorithm in which, given a list of numbers I must calculate a coefficient given by the ratio between the number of triangles founded in the data list and the minimum number of neighbors that a number has; for example, given the first two rows of the file:
1 2 3 4 5 6 9
2 1 3
...
if the first element of a row appears in the other rows and if the first element of the subsequent rows appear in the row taken in exam then I found a link;
if the "link" exists, then I want to count how many times the other elements in the row taken in exam appear in the row where the link is present and print "I have found z triangles".
For example in this case when the program compare the first row and the second row and find that "the link 1 2" exists and found that there is 1 triangle made by the vertex 1,2,3.
In the algorithm I have to divide the number of triangles + 1 by the minimum number of element in each row - 2 ( in this case the minimum number come from the second line and the value is 3-2=1); the coefficient that I am looking for is then (1+1)/1 = 2;
The output file will be written as:
1 2 1
in which in the first two columns I find the element that makes a link and in the 3rd column the value of the coefficient;
Here is the code I've written so far:
use strict;
use warnings;
use 5.010;
use List::Util;
my $filename = "data";
open my $fh, '<', $filename or die "Cannot open $filename: $!";
my $output_file = "output_example";
open my $fi, ">", $output_file or die "Error during $output_file opening: $!";
my %vector;
while (<$fh>) {
my #fields = split;
my $root = shift #fields;
$vector{$root} = { map { $_ => 1} #fields };
}
my #roots = sort { $a <=> $b } keys %vector;
for my $i (0 .. $#roots) {
my $aa = $roots[$i];
my $n_element_a = scalar (keys %{$vector{$aa}})-1;
for my $j ($i+1 .. $#roots) {
my $minimum;
my $bb = $roots[$j];
my $n_element_b = scalar (keys %{$vector{$bb}})-1;
next unless $vector{$aa}{$bb} and $vector{$bb}{$aa};
if ($n_element_a < $n_element_b){
$minimum = $n_element_a;
}else {
$minimum = $n_element_b;
}
my $triangles = 0;
for my $cc ( keys %{$vector{$aa}} ) {
next if $cc == $aa or $cc == $bb;
if ($vector{$bb}{$cc}) {
$triangles++;
}
}
my $coeff;
my #minimum_list;
if ($minimum == 0){
$coeff = ($triangles +1)/($minimum+0.00000000001);
} else {
$coeff = ($triangles +1)/($minimum);
}
say $fi "$aa $bb $coeff";
}
}
__DATA__
1 2 3 4 5 6 9
2 1 3
3 1 2
4 1 5
5 1 4
6 1 7 8
8 6 7
9 1 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14
I put the entire dataset at the end. The output file gives:
__OUTPUT__
1 2 2
1 3 2
1 4 2
1 5 2
1 6 0.5
1 9 0.5
2 3 2
4 5 2
6 8 2
9 10 1
9 11 1
10 11 1
10 12 1
10 14 1
11 13 2
12 13 1
12 14 1
14 15 100000000000
Now I would like to find the minimum value of the coefficient, identify the link(s) that present this lower value, erase this elements in the original dataset and repeat the same program on the "new" dataset.
For example in this case the links that present the minimum values are the 1 6 and the 1 9 with a coefficient of 0.5. So now the program should delete in the file data the element "6" in the row that start with "1" and vice-versa and the same with the 9. So now the "new" dataset would be:
1 2 3 4 5
2 1 3
3 1 2
4 1 5
5 1 4
6 7 8
8 6 7
9 10 11
10 9 11 12 14
11 9 10 12 13
12 10 13 14
13 11 12
14 10 12 15
15 14
What I am looking for is:
How can I erase the elements that present the minimum coefficient's value from the dataset contained in the data file?
How can I iterate the processes N times?
To find the minimum from the output file I thought to add at the end of the program these lines:
my $file1 = "output_example";
open my $fg, "<", $file1 or die "Error during $file1 opening: $!";
my #minimum_vector;
while (<$fg>) {
push #minimum_vector, [ split ];
}
my $minima=$minimum_vector[0][2];
for my $i (0 .. $#minimum_vector){
if($minima >= $minimum_vector[$i][2] ){
$minima=$minimum_vector[$i][2];
}
}
say $minima;
close $file1;
But it gives me an error with the $minima because I think I can't read from the same file that I have just created (in this case the output_example file). It runs if I compile in a different program.
The best way to iterate would probably be to break your code up into subroutines. This will also help clarify the code and track down exactly where problems might be occuring.
use strict;
use warnings;
use 5.010;
use List::Utils qw/min/;
sub load_initial_data {
# open and read file, load it into an arrayref and return it.
}
sub find_coefficients {
my $data = shift;
my #results;
foreach my $row (#$data) {
# do stuff to calculate $aa, $bb, $coeff
push #results, [$aa, $bb, $coeff];
}
return \#results;
}
sub filter_data {
my $data = shift;
my $coefficients = shift;
my $min = min map { $_->[2] } #$coefficients;
my #deletions = grep { $min == $_->[2] } #$coefficients;
foreach my $del (#deletions) {
delete( $data->{$del->[0]}{$del->[1]} );
}
}
# doing the actual work:
my $data = load_initial_data();
my $coeffs;
foreach my $pass (0 .. $N) {
$coeffs = find_coefficients( $data );
$data = filter_data( $data, $coeffs );
# You could write $data and/or $coeffs out to a file here
# if you need to keep the intermediate stages
}

Want to filter for the max result and print from a table that contains many results for multiple scenarios

I have a CSV table where I have the merged data for 1024 independent variables and 25 dependent variables that are associated with them. For each independent variable (called 1 .. 1024), I have 10 different outcomes. I would like to
choose the best result for each independent variable, and
pipe the line containing that information into a new CSV file.
It seems like a fairly easy thing to ask of perl, and maybe it would be simple to do with a hash of an array of an array, but I'm still confused about how I could implement something like that for this collection of data.
Current code
I found a very helpful Q&A from 2009 on printing matching lines. It works fairly well after some tinkering, but a few issues remain:
I have to pre-sort the file so that my maximum value is the first value that appears for each case.
I also miss out on getting the best result for the first independent variable and
in some instances I get multiple lines returned to me instead of just the maximum value.
I'm fairly sure there must be an easier way to do this, and I would greatly appreciate any help and/or constructive criticism on my (ripped-off) script.
Thank you!
This is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
unless ($#ARGV == 0) {
print "USAGE: get_best.pl csvfile \n";
exit;
}
### this is a script to get the best "score"
my $input = $ARGV[0];
my $outfile = "bestofthebest.csv";
if (-e $outfile ) {
system "rm $outfile";
}
open(my $fh,'<',"$input") || die "could not open $input"; #try to open input
open (SUMMARY, ">>","$outfile") || die "could not open $outfile"; #open output file for writing
my $this_line = "";
my $do_next = 0;
while (<$fh>) {
chomp($_);
my $last_line = $this_line;
$this_line = $_;
if ($this_line =~ m/Seq/) {
print SUMMARY "$this_line\n";next;
}
my ($compound, $rank, $nnme, $G1, ..., $res1, $res2, $res3, $res4, $res5, $res6 ) = split(/\s+/, $this_line, 26);
my ($compound_old, $rank_old, $nnme_old, $G1_old, ..., $res1_old, $res2_old, $res3_old, $res4_old, $res5_old, $res6_old) = split(/\s+/, $last_line, 26);
foreach ($compound == $compound_old) {
if (($G1 >= $G1_old)){
print SUMMARY "$this_line\n";
print "\n $G1 G1 is >> $G1_old G1_old loop\n";
print "\n compound is $compound G1 is $G1\n";
$do_next = 1;
}
else {
$last_line = "";
$do_next = 0;
}
}
}
close ($fh);
close (SUMMARY);
Example input
This is what the input data looks like (I've left off some columns and rows, obviously)
10 8 3 -18.08 -1.4 -16.68 -15.94 -2.13 -9.45
11 10 4 -15.2 3.2 -18.4 -18.02 2.82 -5
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Current Output:
10 2 3 -10.11 8.94 -19.04 -18.48 8.38 -4.09
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
15 10 4 -21.51 -1.51 -20 -17.63 -3.88 -2.45
16 5 4 -17.81 2.56 -20.37 -19.09 1.28 -1.19
16 2 4 -16.61 1.97 -18.58 -21.06 4.45 -6.47
Perhaps the follow will be helpful:
use strict;
use warnings;
my %hash;
while (<DATA>) {
my ( $indVarID, $val ) = (split)[ 0, 3 ];
$hash{$indVarID} = [ $val, $_ ]
if !exists $hash{$indVarID}
or $hash{$indVarID}[0] < $val;
}
print $hash{$_}[1] for sort { $a <=> $b } keys %hash;
__DATA__
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Output:
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
This builds a hash of arrays (HoA), where the key is the independent variable ID and the value is a reference to a two-element list. The zeroth element in the list is the value found in the record's fourth column. The first element is the record.
As records are being read, if a new value for an independent variable is greater than the older value (or if there wasn't an older one), the new value and record are stored in the list.
When done, the keys are numerically sorted and the records which contained the greatest value for each independent variable ID are printed.

How can I correctly calculate the lengths of fields in a CSV dcoument using Perl?

I have a datas et and like to do a simple while operation with a Perl script.
Here is a small extraction from the dataset:
"number","code","country","gamma","X1","X2","X3","X4","X5","X6"
1,"DZA","Algeria","0.01",7.44,47.3,0.46,0,0,0.13
2,"AGO","Angola","0.00",6.79,"NULL",0.21,1,0,0.28
3,"BEN","Benin","-0.01",7.02,38.9,0.27,1,0,0.05
4,"BWA","Botswana","0.06",6.28,45.7,0.42,1,0,0.07
5,"HVO","Burkina Faso","0.00",6.15,36.3,0.08,1,0,0.05
6,"BDI","Burundi","0.00",6.38,41.8,0.18,1,0,0
The script should count the length of every , separated field and store the highest values
into an array.
However, the saving doesn't work properly. Here is a part of the code:
#maxl = map length, #terms;
while(`<INFILE>`) {
$_ =~ s/[\"\n]//g ;
#terms = split/$sep/, $_;
#lengths = map length, #terms;
for($k = 0, $k <= $#terms, $k++) {
if($lengths[$k] > $maxl[$k]) {
$maxl[$k] = $lenghts[$k];
}
}
print "#lengths\n";
}
Now the #maxl uses an earlier part from the code where it uses the second line of the dataset.
When I use a print command just to see the values of the #maxl operation i get:
1 3 7 4 4 4 4 1 1 5
In the while loop I used another print statement just to see the other values, I get:
1 3 6 4 4 4 4 1 1 4
1 3 5 5 4 4 4 1 1 4
1 3 8 4 4 4 4 1 1 4
1 3 12 4 4 4 4 1 1 4
1 3 7 4 4 4 4 1 1 1
1 3 8 4 4 4 4 1 1 4
1 3 10 4 4 4 4 1 1 4
1 3 16 5 4 4 4 1 1 4
2 3 4 5 3 4 4 1 1 4
2 3 7 4 4 4 4 1 1 4
2 3 5 4 4 4 4 1 1 4
2 3 5 4 4 4 4 1 1 4
2 3 8 4 4 4 4 1 1 4
2 3 5 4 4 4 1 1 1 4
The fourth column eg has obviously values which are greater than 3. The while loop was supposed to save the greatest values and substitute those values into #maxl.
What went wrong?
...in the for loop the comma are wrong
for($k = 0, $k <= $#terms, $k++)
however, after cleaning that up there still seems to be a problem...
there's a typo here
$maxl[$k] = $lenghts[$k];
for starters (which 'use strict' would have caught)
consider using Text::CSV for more reliable parsing of comma-separated data (it can also handle other separators):
#!/usr/bin/perl
use strict;
use warnings;
use Text::CSV;
my $csv = Text::CSV->new();
my #max_lengths;
while ( my $line = <INFILE> ) {
die "Unable to parse '$line'" unless $csv->parse($line);
my #column_lengths = map { length } $csv->fields();
for my $i ( 0 .. $#column_lengths ) {
if ( $column_lengths[$i] > ($max_lengths[$i] || 0) ) {
$max_lengths[$i] = $column_lengths[$i];
}
}
}
print "MAX LENGTHS OF EACH FIELD: #max_lengths\n";