My data looks like this,
1 20010101 945 A 6
1 20010101 946 B 4
1 20010101 947 P 3.5
1 20010101 950 A 5
1 20010101 951 P 4
1 20010101 952 P 4
1 20010101 1010 A 4
1 20010101 1011 P 4
2 20010101 940 A 3.5
2 20010101 1015 A 3
2 20010101 1113 B 3.5
2 20010101 1114 P 3.2
2 20010101 1115 B 3.4
2 20010101 1116 P 3.1
2 20010101 1119 P 3.6
I am trying to find all the lines (with P) followed by the latest A and B values based on the matching of first two columns (e.g., 1 and 20010101).
The result is expected to be like this,
1 20010101 947 P 3.5 6 4
1 20010101 951 P 4 5 4
1 20010101 952 P 4 5 4
1 20010101 1011 P 4 4 4
2 20010101 1114 P 3.2 3 3.5
2 20010101 1116 P 3.1 3 3.4
2 20010101 1119 P 3.6 3 3.4
Does it need to sort by using hash in Perl? I am lack of ideas could anybody give any hint? I will be much appreciated!
perl -ane 'if($F[3] eq "P"){ s/$/ $la $lb/; print; }else{ ($la,$lb) = ($F[3] eq "A")?($F[4],$lb):($la,$F[4]) }' data.txt
Simplest solved with a simple if-elsif structure:
use strict;
use warnings;
my ($A, $B);
while (<DATA>) {
my #data = split;
if ($data[3] eq "A") {
$A = $data[4];
} elsif ($data[3] eq "B") {
$B = $data[4];
} elsif ($data[3] eq "P") {
print join("\t", #data, $A, $B), "\n";
}
}
__DATA__
1 20010101 945 A 6
1 20010101 946 B 4
1 20010101 947 P 3.5
1 20010101 950 A 5
1 20010101 951 P 4
1 20010101 952 P 4
1 20010101 1010 A 4
1 20010101 1011 P 4
2 20010101 940 A 3.5
2 20010101 1015 A 3
2 20010101 1113 B 3.5
2 20010101 1114 P 3.2
2 20010101 1115 B 3.4
2 20010101 1116 P 3.1
2 20010101 1119 P 3.6
Output:
1 20010101 947 P 3.5 6 4
1 20010101 951 P 4 5 4
1 20010101 952 P 4 5 4
1 20010101 1011 P 4 4 4
2 20010101 1114 P 3.2 3 3.5
2 20010101 1116 P 3.1 3 3.4
2 20010101 1119 P 3.6 3 3.4
You might want to compensate for possible empty/undefined/old values in $A and $B.
Related
i have an input file with following 5 columns and i want to average the column numbers 3, 4 ,5 individually 3, 4, 5 till its 2nd column value is 5 and similarly for 2nd column value 7 and 2.
PHE 5 2 4 6
PHE 5 4 6 4
PHE 5 4 2 8
TRP 7 5 5 9
TRP 7 5 7 1
TRP 7 5 7 3
TYR 2 4 4 4
TYR 2 4 4 0
TYR 2 4 5 3
and i want an output like this
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3
perl -lane'
$k = join "\t", splice(#F, 0, 2);
$h{$k}{c}++ or push(#r, $k);
$h{$k}{t}[$_] += $F[$_] for 0 .. $#F;
END {
$, ="\t";
for (#r) {
($t, $c) = #{$h{$_}}{"t", "c"};
print $_, map sprintf("%.1f", $_/$c)*1, #$t;
}
}
' file
output
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3
Nice solution mpapec.
I started the the following solution as an experiment to see if I could code something that would only require a single for loop, with no END block needed. It devolved into 6 for loops instead and a perfect example of how never to code unless your goal is obfuscation.
Yep, it uses an external module. Yes, it's the stupidist code I'll ever post (I hope). But at the very least it might get a chuckle out of someone. And yep, it works! :)
use Array::Transpose;
use List::Util qw(sum max);
use strict;
use warnings;
my $g;
my $l;
print "$_\n" for map {
join ' ', map {sprintf "%-$_->[0]s", $_->[1]} transpose [$l, $_]
} grep {
$l = [map {max #$_} transpose [[map {length $_} #$_], $l || ()]]
} [qw(Txt Num Ave Ave Ave)], map {
my #c = transpose $_;
[$c[0][0], $c[1][0], map {map {/\./ ? sprintf("%.1f", $_) : $_} sum(#$_) / #$_} #c[2..$#c]]
} map {
$g && $g->[0][0] eq $_->[0] ? (push #$g, $_) && () : ($g = [$_])
} map {[split]} (<DATA>);
__DATA__
PHE 5 2 4 6
PHE 5 4 6 4
PHE 5 4 2 8
TRP 7 5 5 9
TRP 7 5 7 1
TRP 7 5 7 3
TYR 2 4 4 4
TYR 2 4 4 0
TYR 2 4 5 3
Outputs
Txt Num Ave Ave Ave
PHE 5 3.3 4 6
TRP 7 5 6.3 4.3
TYR 2 4 4.3 2.3
Script without using modules.
Try this....
#!/usr/bin/env perl
open(DATA, "<input.txt") or die "Couldn't open file file.txt, $!";
my %h=();
my %c=();
print "\n";
while(<DATA>){
my $temp=$_;
if($temp=~m/^([A-Z]{3})\s+([\d]+)\s+([\d]+)\s+([\d]+)\s+([\d]+)/is)
{
my $key=$1;
$h{$key}{1} +=$2;
$h{$key}{2} +=$3;
$h{$key}{3} +=$4;
$h{$key}{4} +=$5;
if($c{$key})
{
$c{$key}++;
}
else
{
$c{$key}=1;
}
}
}
foreach $key (sort(keys %h)) {
#print $key.'='.$h{$key}{1}/$c{$key}." ".$h{$key}{2}/$c{$key}." ".$h{$key}{3}/$c{$key}." ".$h{$key}{4}/$c{$key};
printf("%s %d %.1f %.1f %.1f", $key, $h{$key}{1}/$c{$key},$h{$key}{2}/$c{$key},$h{$key}{3}/$c{$key},$h{$key}{4}/$c{$key});
print "\n";
}
print "\n";
close(DATA);
______OUTPUT________
PHE 5 3.3 4.0 6.0
TRP 7 5.0 6.3 4.3
TYR 2 4.0 4.3 2.3
I have a text file (the first two lines are character spacings):
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 1 N1 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N1 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N1 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N1 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N1 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N1 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N1 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N1 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N1 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N1 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N
I want to edit it and make it like:
1 2 3 4 5 6 7 8
12345678901234567890123456789012345678901234567890123456789012345678901234567890
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N
The number of spaces between each column are important and the list of atoms needs to go up to 190 (N001-N190). Thus I would like to replace characters 13-16 (" N1 ") in file 1 with ("N001") and keep the remainder of the file in the original spacing.
You don't need 10 long lines of sample input to demonstrate the problem or the solution:
$ cat file
ATOM 1 N1 SPINA 3
ATOM 2 N1 SPINA 3
ATOM 10 N1 SPINA 3
$ awk '{print substr($0,1,12) sprintf("N%03d",$2) substr($0,17)}' file
ATOM 1 N001 SPINA 3
ATOM 2 N002 SPINA 3
ATOM 10 N010 SPINA 3
I'm assuming we could use $2 as the numeric part of the 3rd field. It seems to increment sequentially with your line numbers. Using NR might be an alternative. If neither of those is actually what you want, post some more representative sample input/output.
Also, note that any solution that involves assigning to a field (e.g. $3=...) WILL cause awk to recompile the line using the value of OFS as the field separator and so will change your spacing.
Oh, and if those 2 initial lines of character spacings are really present in your files, this is the tweak:
$ cat file
1 2
12345678901234567890123456
ATOM 1 N1 SPINA 3
ATOM 2 N1 SPINA 3
ATOM 10 N1 SPINA 3
$ awk 'NR>2{$0 = substr($0,1,12) sprintf("N%03d",$2) substr($0,17)} 1' file
1 2
12345678901234567890123456
ATOM 1 N001 SPINA 3
ATOM 2 N002 SPINA 3
ATOM 10 N010 SPINA 3
Try :
$ awk '{$3=substr($3,1,1) sprintf("%03d",$2)}1' OFS=\\t file
Note : OFS will be tab
If you want to try this on a Solaris/SunOS system, change awk to /usr/xpg4/bin/awk , /usr/xpg6/bin/awk , or nawk
--edit--
if you want to increment with line
$ awk '{$3=substr($3,1,1) sprintf("%03d",NR)}1' OFS=\\t file
Here is yet another way:
awk 'sub(/.$/,sprintf("%03d",NR),$3)' OFS='\t' file
Output:
$ awk 'sub(/.$/,sprintf("%03d",NR),$3)' OFS='\t' file
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00 S N
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00 S N
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00 S N
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00 S N
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00 S N
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00 S N
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00 S N
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00 S N
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00 S N
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00 S N
If you are interesting to resolve it with pure shell, here is the code:
while IFS="\n" read -r line
do
n=${line:9:3}
printf "%sN%03d%s\n" "${line:0:12}" $n "${line:16}"
done < file
awk '$3="N"sprintf("%03d",$2)' OFS='\t' infile.txt
Result
ATOM 1 N001 SPINA 3 30.616 29.799 14.979 1.00 20.00SN
ATOM 2 N002 SPINA 3 28.146 28.381 13.950 1.00 20.00SN
ATOM 3 N003 SPINA 3 27.605 28.239 14.037 1.00 20.00SN
ATOM 4 N004 SPINA 3 30.333 29.182 15.464 1.00 20.00SN
ATOM 5 N005 SPINA 3 29.608 29.434 14.333 1.00 20.00SN
ATOM 6 N006 SPINA 3 29.303 29.830 13.317 1.00 20.00SN
ATOM 7 N007 SPINA 3 28.963 31.116 13.472 1.00 20.00SN
ATOM 8 N008 SPINA 3 28.859 28.743 13.828 1.00 20.00SN
ATOM 9 N009 SPINA 3 29.699 30.575 14.564 1.00 20.00SN
ATOM 10 N010 SPINA 3 29.518 29.194 15.301 1.00 20.00SN
I have a population generated by pop=[pop;x;y;z;cst,fr]; where first 4 rows are x, second 3 rows are y and third 8 rows are z. cst is sum of column 1 and fr is calculated failure rate of column 2.
6 0.876
5 0.99
3 0.939
6 0.876
4 0.837
7 0.959
4 0.953
4 0.873
0 0
5 0.95
3 0.855
4 0.873
4 0.873
5 0.95
6 0.951
66 0.00032352
6 0.876
6 0.876
6 0.965
6 0.965
4 0.953
4 0.837
4 0.953
0 0
3 0.855
6 0.951
5 0.95
0 0
0 0
3 0.855
6 0.951
59 0.00038143
6 0.965
5 0.888
6 0.965
3 0.863
7 0.889
7 0.959
4 0.953
7 0.915
6 0.968
3 0.855
3 0.855
8 0.942
4 0.873
3 0.855
8 0.942
80 0.0002327
How can I sort the specific (16,32,48) rows followed by unchanged (1:15,17:31,33:47)?
for example:
6 0.876
6 0.876
6 0.965
6 0.965
4 0.953
4 0.837
4 0.953
0 0
3 0.855
6 0.951
5 0.95
0 0
0 0
3 0.855
6 0.951
59 0.00038143
6 0.876
5 0.99
3 0.939
6 0.876
4 0.837
7 0.959
4 0.953
4 0.873
0 0
5 0.95
3 0.855
4 0.873
4 0.873
5 0.95
6 0.951
66 0.00032352
6 0.965
5 0.888
6 0.965
3 0.863
7 0.889
7 0.959
4 0.953
7 0.915
6 0.968
3 0.855
3 0.855
8 0.942
4 0.873
3 0.855
8 0.942
80 0.0002327
Please help!
It's not clear exactly what order you want, but if you know you want some specific row ordering like this: (16,32,48) rows followed by unchanged (1:15,17:31,33:47) then you can use indexing just like this:
n = [16, 32, 48, 1:15, 17:31, 33:47]; % indexes for sorting
popsort = pop(n,:); %index into pop by rows
There are various tricks you could use to create the index vector n.
Now, if you are calling a loop and appending new values to pop like this:
pop = []
for n = 1:3
% calculate x, y, z, etc.
pop=[pop;x;y;z;cst,fr];
end
Then it may be better to pre-allocate the matrix pop and put your values where you want them in the first place, or use two variables, one containing your x,y,z and the other with the cst,fr values in. That would avoid the need to sort the rows afterwards:
m = 3; % works for any number of loops
pop = zeros(m*15,2);
cst_fr = zeros(m,2);
for n = 0:m-1
% calculate x, y, z, etc.
pop(1+n*15:15+n*15,:)=[x;y;z];
cst_fr(n+1,:)=[cst,fr];
end
I am new in matlab and I am not familiar with array of matrices. I have a number of matrices nx6:
<26x6 double>
<21x6 double>
<27x6 double>
<36x6 double>
<29x6 double>
<30x6 double>
....
Each matrix is of this type:
>> Matrix{1,1}
A B C D E F
1 2 6 223 735064.287500000 F11
2 3 6 223 735064.288194445 F12
3 4 6 223 735064.288888889 F13
4 5 6 223 735064.290277778 F14
>> Matrix{2,1}
A B C D E F
1 2 6 223 735064.700694445 F21
2 3 6 223 735064.701388889 F22
3 4 6 223 735064.702083333 F23
4 5 6 223 735064.702777778 F24
>> Matrix{3,1}
A B C D E F
1 2 7 86 735064.3541666666 F31
2 3 7 86 735064.3548611112 F32
3 4 7 86 735064.3555555555 F33
4 5 7 86 735064.3562499999 F34
5 6 7 86 735064.702777778 F35
>> Matrix{4,1}
A B C D E F
1 2 7 86 735064.3569444444 F41
2 3 7 86 735064.3576388888 F42
3 4 7 86 735064.3583333333 F43
4 5 7 86 735064.3590277778 F44
5 6 6 86 735064.702777778 F45
Where E and F are dates in datenum format. Specifically F is the time difference.
Considering all matrices at once, I would like to sum the values of column F among all the matrices that have equal values in columns A, B, D.
For each value of the column D (the number of bus), I would like to obtain a new matrix like the following one:
A B C D H
1 2 6 223 F11+F21
2 3 6 223 F12+F22
3 4 6 223 F13+F23
4 5 6 223 F14+F24
A B C D H
1 2 7 86 F31+F41
2 3 7 86 F32+F42
3 4 7 86 F33+F43
4 5 7 86 F34+F44
5 6 7 86 F35+F45
Thank you in advance for you help!
This approach should get you started. I suggested setting up a matrix that stores the comparison between the columns 1,2 and 4. Based on that matrix you can then generate your output matrix. This saves you nested if statements and checks in your loop.
Here's an example (please note that I changed row 3 of Matrix{1,1}):
Matrix{1,1} = [ ...
1 2 6 223 735064.287500000 1;
2 3 6 223 735064.288194445 2;
3 4 6 223 735064.288888889 3;
4 5 6 223 735064.290277778 4];
Matrix{2,1} = [ ...
1 2 6 223 735064.700694445 10;
2 3 6 223 735064.701388889 10;
2 4 6 223 735064.702083333 10;
4 5 6 223 735064.702777778 10];
COMP = Matrix{1,1}(:,[1:2 4])==Matrix{2,1}(:,[1:2 4]);
a = 1;
for i=1:size(Matrix{1,1},1)
if sum(COMP(i,:)) == 3
SUM{1,1}(a,1:5) = Matrix{1,1}(i,1:5);
SUM{1,1}(a,6) = Matrix{1,1}(i,6) + Matrix{2,1}(i,6);
a = a + 1;
end
end
The matrix COMP stores a 1 for each element that is the same in Matrix{1,1} and Matrix{2,1} when comparing columns 1, 2 and 4.
This reduces the if-statement to a check if all elements in a row agree (hence sum == 3). If that condition is satisfied, a new matrix is generated (SUM{1,1}) which sums the entries in column 6, in this case:
SUM{1,1}(:,6) =
11
12
14
I have a CSV table where I have the merged data for 1024 independent variables and 25 dependent variables that are associated with them. For each independent variable (called 1 .. 1024), I have 10 different outcomes. I would like to
choose the best result for each independent variable, and
pipe the line containing that information into a new CSV file.
It seems like a fairly easy thing to ask of perl, and maybe it would be simple to do with a hash of an array of an array, but I'm still confused about how I could implement something like that for this collection of data.
Current code
I found a very helpful Q&A from 2009 on printing matching lines. It works fairly well after some tinkering, but a few issues remain:
I have to pre-sort the file so that my maximum value is the first value that appears for each case.
I also miss out on getting the best result for the first independent variable and
in some instances I get multiple lines returned to me instead of just the maximum value.
I'm fairly sure there must be an easier way to do this, and I would greatly appreciate any help and/or constructive criticism on my (ripped-off) script.
Thank you!
This is what I have so far:
#!/usr/bin/perl
use warnings;
use strict;
unless ($#ARGV == 0) {
print "USAGE: get_best.pl csvfile \n";
exit;
}
### this is a script to get the best "score"
my $input = $ARGV[0];
my $outfile = "bestofthebest.csv";
if (-e $outfile ) {
system "rm $outfile";
}
open(my $fh,'<',"$input") || die "could not open $input"; #try to open input
open (SUMMARY, ">>","$outfile") || die "could not open $outfile"; #open output file for writing
my $this_line = "";
my $do_next = 0;
while (<$fh>) {
chomp($_);
my $last_line = $this_line;
$this_line = $_;
if ($this_line =~ m/Seq/) {
print SUMMARY "$this_line\n";next;
}
my ($compound, $rank, $nnme, $G1, ..., $res1, $res2, $res3, $res4, $res5, $res6 ) = split(/\s+/, $this_line, 26);
my ($compound_old, $rank_old, $nnme_old, $G1_old, ..., $res1_old, $res2_old, $res3_old, $res4_old, $res5_old, $res6_old) = split(/\s+/, $last_line, 26);
foreach ($compound == $compound_old) {
if (($G1 >= $G1_old)){
print SUMMARY "$this_line\n";
print "\n $G1 G1 is >> $G1_old G1_old loop\n";
print "\n compound is $compound G1 is $G1\n";
$do_next = 1;
}
else {
$last_line = "";
$do_next = 0;
}
}
}
close ($fh);
close (SUMMARY);
Example input
This is what the input data looks like (I've left off some columns and rows, obviously)
10 8 3 -18.08 -1.4 -16.68 -15.94 -2.13 -9.45
11 10 4 -15.2 3.2 -18.4 -18.02 2.82 -5
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Current Output:
10 2 3 -10.11 8.94 -19.04 -18.48 8.38 -4.09
11 5 4 -15.22 2.71 -17.92 -15.88 0.66 -4.51
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
15 10 4 -21.51 -1.51 -20 -17.63 -3.88 -2.45
16 5 4 -17.81 2.56 -20.37 -19.09 1.28 -1.19
16 2 4 -16.61 1.97 -18.58 -21.06 4.45 -6.47
Perhaps the follow will be helpful:
use strict;
use warnings;
my %hash;
while (<DATA>) {
my ( $indVarID, $val ) = (split)[ 0, 3 ];
$hash{$indVarID} = [ $val, $_ ]
if !exists $hash{$indVarID}
or $hash{$indVarID}[0] < $val;
}
print $hash{$_}[1] for sort { $a <=> $b } keys %hash;
__DATA__
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
11 4 4 -16.63 0.48 -17.1 -15.75 -0.87 -5.92
11 6 4 -15.21 1.83 -17.04 -18.41 3.21 -7
11 9 4 -15.18 1.82 -17 -16.56 1.38 -7.09
11 8 4 -14.98 1.93 -16.91 -16.78 1.79 -10.81
11 2 4 -18.75 -1.95 -16.8 -17.83 -0.92 -7.35
11 1 4 -19.67 -3.17 -16.5 -16.4 -3.27 -9.01
11 3 4 -16.69 -0.54 -16.14 -16.35 -0.34 -9.17
12 7 4 -19.54 -1.14 -18.41 -17.74 -1.81 -2.79
12 9 4 -19.09 -1.01 -18.08 -16.01 -3.09 -5.56
12 4 4 -19.48 -2.18 -17.3 -16.34 -3.14 -4
12 2 4 -19.86 -2.77 -17.1 -15.97 -3.9 -2.96
12 8 4 -19.49 -2.45 -17.03 -16.39 -3.1 -7.19
12 1 4 -20.28 -3.33 -16.95 -17.12 -3.16 -5.18
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
12 5 4 -19.63 -2.86 -16.77 -16.41 -3.22 -6.54
12 6 4 -19.81 -3.25 -16.56 -16.53 -3.27 -7.19
12 10 4 -19.39 -2.95 -16.44 -17.42 -1.97 -7.67
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
13 8 3 -21.44 -2.32 -19.11 -17.08 -4.36 -1.93
13 3 3 -16 2.94 -18.94 -19.24 3.24 -2.78
13 2 3 -13.79 4.9 -18.7 -17.35 3.56 -4.72
13 6 3 -22.08 -3.4 -18.68 -20.12 -1.96 -6.74
13 9 3 -18.98 -0.32 -18.66 -15.97 -3.01 -3.06
13 7 3 -20.4 -2.08 -18.32 -18.24 -2.17 -5.71
13 5 3 -19.94 -1.62 -18.32 -19.42 -0.52 -7.44
13 10 3 -19.26 -1.25 -18.01 -17.52 -1.74 -5.68
13 4 3 -17.75 -1.33 -16.42 -17.75 0 -9.15
14 9 3 -22.23 -3.43 -18.79 -16.68 -5.55 -3.91
14 5 3 -21.32 -2.95 -18.37 -18.08 -3.24 -6.03
14 7 3 -24.25 -6.29 -17.96 -18.78 -5.47 -9.21
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
14 4 3 -21.59 -3.93 -17.67 -19.32 -2.28 -6.55
14 1 3 -22.43 -4.79 -17.63 -18.09 -4.34 -5.63
Output:
11 7 4 -14.06 3.84 -17.89 -16.7 2.64 -5.73
12 3 4 -18.78 -1.93 -16.86 -17.81 -0.98 -5.39
13 1 3 -13.05 6.35 -19.4 -18.71 5.66 -6.43
14 6 3 -21.03 -3.14 -17.89 -19.17 -1.86 -10.11
This builds a hash of arrays (HoA), where the key is the independent variable ID and the value is a reference to a two-element list. The zeroth element in the list is the value found in the record's fourth column. The first element is the record.
As records are being read, if a new value for an independent variable is greater than the older value (or if there wasn't an older one), the new value and record are stored in the list.
When done, the keys are numerically sorted and the records which contained the greatest value for each independent variable ID are printed.